Compute Where It Counts: a trainable LLM sparsity enabling 4x CPU speed

Something I've been working on with friends at crystalai.org !

Introducing CWIC a trainable LLM sparsity paradigm that beats SOTA methods, enabling 80% sparsity and 4x+ speedups on CPU. It works on models as small as 1b, outperforming TEAL R-sparse and friends. We are releasing code at https://github.com/crystal-ai-org/cwic if your interested in our our work feel free to reach out at https://x.com/crystalAIorg, we love collaboration!

Compute Where It Counts: a trainable LLM sparsity enabling 4x CPU speed

Comments (1)