Show HN: oLLM – LLM Inference for large-context tasks on consumer GPUs

3 anuarsh 6 8/28/2025, 11:19:57 PM github.com ↗

Comments (6)

Haeuserschlucht · 2h ago

20 minutes is a huge turnoff, unless you have it run over night.... Just to get the hint that you should exercise self care in the morning when presenting a legal paper and have the ai check it for flaws.

Haeuserschlucht · 1h ago

It's better to have software erase all private details from text and have it checked by cloud ai to then have all placeholders replaced back at your harddrive.

attogram · 7h ago

"~20 min for the first token" might turn off some people. But it is totally worth it to get such a large context size on puny systems!

anuarsh · 7h ago

Absolutely, there are tons of cases where interactive experience is not required, but ability to process large context to get insights.

attogram · 23m ago

It would be interesting to see some benchmarks of this vs, for example, Ollama running localy with no timeout

anuarsh · 8h ago

Hi everyone, any comments or questions are appreciated