Ask HN: How to increase LLM inference speed?
1 InkCanon 0 6/15/2025, 10:08:28 AM
Hi HN,
I'm building software that has a very tight feedback loop with the user. One part involves a short (few hundred tokens) response from an LLM. By far this is the biggest UX problem - currently DeepSeek's total time taken can reach 10 seconds, which is horrific. Would it be possible to practically reduce the speed to maybe ~2 seconds? The LLM just asks to rephrase (while preserving meaning) of a short text, so it does not need to be SOTA. On the whole faster inference time is much more important.
No comments yet