Let's Play some Glider 4.0 with John Calhoun [video] (youtube.com)
1 points by zdw 1h ago 0 comments
The secret fast track for animal drugs (worksinprogress.co)
2 points by paulpauper 1h ago 0 comments
High performance client for Baseten.co
7 mich5632 1 6/13/2025, 4:58:21 PM github.com ↗
Comments (1)
mich5632 · 11h ago
We wrote a rust py03 client for OpenAI embeddings compatible servers (openai.com, or infinity, TEI, vllm, sglang).
Most server-side ML infrastructure auto-scales based on the workload. On embedding workloads, this is no longer the bottleneck and has shifted to the client. In Python, the client is blocked by the global interpreter lock.
With the performance package, we release the gil during requests, so you have available resources to query your VectorDB again.