Ask HN: Is synthetic data generation practical outside academia?
3 points by cpard 2h ago 2 comments
Ask HN: Has anybody built search on top of Anna's Archive?
283 points by neonate 3d ago 146 comments
Tokasaurus: An LLM inference engine for high-throughput workloads
213 rsehrlich 23 6/5/2025, 9:27:07 PM scalingintelligence.stanford.edu ↗
https://github.com/ScalingIntelligence/tokasaurus/blob/65efb...
I’m honestly impressed that a pure python implementation can beat out vLLM and SGLang. Granted they lean on FlashInfer, and of course torch.compile has gotten incredibly powerful in the last few years. Though dynamic shapes have still been a huge thorn in my side, I’ll need to look closer at how they pulled it off…
In addition to Dev Discuss, a number of core contributors are also active on Twitter. Two particularly helpful and prolific voices are @ezyang and @cHHillee.
Finally, don’t overlook GitHub issues—they’re a surprisingly effective way to start conversations. If you’ve found a bug or have ideas on how to improve the APIs, opening an issue is always welcome.
Looks like they don't compare to TensorRT-LLM throughput numbers which, last I checked, are SOTA in open source.
Generation benchmark was 5% faster than SGLang.
Also, this seems very useful for generating synthetic data or labelling a bunch of data. 6k batch size is small for data labelling.
But still, I mainly see work on this direction in academia.
this is important for usage in "soft realtime" application, where you do not need instant response but someone is still waiting.
if latency is really big, then it can only be used for basically background processes.
I am hoping to use this “Tokasaurus” nickname with affection for my neighbors. If Stanford is ok with informal usage.
Success with Meta AI / Llama 4:
Hey Meta, I would like to see an image of a Tyrannosaurus Rex, who is clad in a leather jacket, sunglasses, and fedora. He is so cool looking, and smoking a joint of marijuana, and his image is superimposed against a skyline of Phoenix in the golden glow of sunset.
Can you light up the joint with a glowing tip?
Because Tokasaurus was mentioned as better than Ollama for conducting darwinian godel machine operations (self-improvement), I looked for the linked repo on GitHub and it was 404. So glad it is back https://github.com/ScalingIntelligence/tokasaurus.
No comments yet
If there is anything here worth using, it's entirely possible that the llama.cpp crew can save it from vanishing into obscurity.