Show HN: oLLM – LLM Inference for large-context tasks on consumer GPUs
3 anuarsh 3 8/28/2025, 11:19:57 PM github.com ↗
Comments (3)
attogram · 41m ago
"~20 min for the first token" might turn off some people. But it is totally worth it to get such a large context size on puny systems!
anuarsh · 16m ago
Absolutely, there are tons of cases where interactive experience is not required, but ability to process large context to get insights.
anuarsh · 1h ago
Hi everyone, any comments or questions are appreciated