High performance client for Baseten.co

7 mich5632 1 6/13/2025, 4:58:21 PM github.com ↗

Comments (1)

mich5632 · 11h ago
We wrote a rust py03 client for OpenAI embeddings compatible servers (openai.com, or infinity, TEI, vllm, sglang). Most server-side ML infrastructure auto-scales based on the workload. On embedding workloads, this is no longer the bottleneck and has shifted to the client. In Python, the client is blocked by the global interpreter lock. With the performance package, we release the gil during requests, so you have available resources to query your VectorDB again.