Show HN: oLLM – LLM Inference for large-context tasks on consumer GPUs

3 anuarsh 6 8/28/2025, 11:19:57 PM github.com ↗

Comments (6)

Haeuserschlucht · 2h ago
20 minutes is a huge turnoff, unless you have it run over night.... Just to get the hint that you should exercise self care in the morning when presenting a legal paper and have the ai check it for flaws.
Haeuserschlucht · 1h ago
It's better to have software erase all private details from text and have it checked by cloud ai to then have all placeholders replaced back at your harddrive.
attogram · 7h ago
"~20 min for the first token" might turn off some people. But it is totally worth it to get such a large context size on puny systems!
anuarsh · 7h ago
Absolutely, there are tons of cases where interactive experience is not required, but ability to process large context to get insights.
attogram · 23m ago
It would be interesting to see some benchmarks of this vs, for example, Ollama running localy with no timeout
anuarsh · 8h ago
Hi everyone, any comments or questions are appreciated