Teaching a small embedding model with LLMs to deliver GPT-like semantics in 10ms

1 mattmmm 1 9/8/2025, 5:35:40 PM instantdomainsearch.com ↗

Comments (1)

mattmmm · 13h ago
Hey folks! I’m Matt, CEO at Instant Domain Search. Quick summary: we distilled LLM judgments into a 22.7M-parameter embedding model and optimized CPU inference to deliver sub-10ms latency for semantic domain matches (correlation ≈0.87 with GPT-4).

The post walks through our training signal, distillation choices, quantization, index layout, and production latency/CPU learnings.

We’re a small team of 4 engineers building free, wicked fast search tools. AMA or feedback welcome!