Fastgen – SOTA LLM inference in 3k lines of Python
3 mpu 1 5/16/2025, 2:50:21 PM github.com ↗
Comments (1)
mpu · 7h ago
We just released a tiny (~3kloc) Python library that implements state-of-the-art inference algorithms on GPU and provides performance similar to vLLM. We believe it's a great learning vehicle for inference techniques and the code is quite easy to hack on!