10,500 tokens/SEC per request on Nvidia hardware

2 bhaktatejas922 3 9/15/2025, 6:19:47 PM morphllm.com ↗

Comments (3)

bhaktatejas922 · 1h ago
We just doubled our speculative code edit throughput and hit 10k tok/sec per request!

Morph now merges code at 10,500 tokens/sec — roughly 4× faster than the best speeds on Cerebras.

That kind of speed makes previously impractical workloads trivial: applying complex edits across a 40k-token document now takes under 4 seconds. This isn’t a vanity metric - we think it unlocks an entirely new domain of AI use cases where codebases, configs, or long documents can be semantically edited in real time.

Morph is a Fast Apply model dedicated to merging edits from frontier LLMs We want to enable developers to build realtime interfaces with AI

NitpickLawyer · 1h ago
Help me understand. Is this for cases where you have a file and you "ask" an LLM to change something, and they reply in chat mode with something like < //--unchanged code \n changed line \n changed line \n //----remaining code unchanged > ?

If so, isn't this flow like 6mo old, and not really used anymore? The latest tools (terminal based and vscode extensions like cline/roo/kilo) already support "diff edits", where the model outputs a diff format that the tool speaks. I get "instant" edits that way, right in my IDE, and model support has been great (gpt5,claude4,gemini2.5,grok-fast-1, etc.)

So what's the use case of this model, then? Cool technical results, and congrats, but it seems the "field" has already solved for this particular problem?

anon191928 · 1h ago
SEC ? they will be against this. They have been against financial innovation and if they see this they will be against this too. SEC is special. sec is ok