Show HN: Llmswap – Python package to reduce LLM API costs by 50-90% with caching
11 sreenathmenon 3 8/10/2025, 4:16:59 PM pypi.org ↗
I built llmswap to solve a problem I kept hitting in hackathons - burning through API credits while testing the same prompts repeatedly during development.
It's a simple Python package that provides a unified interface for OpenAI, Anthropic, Google Gemini, and local models (Ollama), with built-in response caching that can cut API costs by 50-90%.
Key features: - Intelligent caching with TTL and memory limits - Context-aware caching for multi-user apps - Auto-fallback between providers when one fails - Zero configuration - works with environment variables
from llmswap import LLMClient
client = LLMClient(cache_enabled=True)
response = client.query("Explain quantum computing")
# Second identical query returns from cache instantly (free)
The caching is disabled by default for security. When enabled, it's thread-safe and includes context isolation for multi-user applications.Built this from components of a hackathon project. Already at 2.2k downloads on PyPI. Hope it helps others save on API costs during development.
GitHub: https://github.com/sreenathmmenon/llmswap PyPI: https://pypi.org/project/llmswap/
The most popular probably being redis.