Show HN: Llmswap – Python package to reduce LLM API costs by 50-90% with caching
12 sreenathmenon 6 8/10/2025, 4:16:59 PM pypi.org ↗
I built llmswap to solve a problem I kept hitting in hackathons - burning through API credits while testing the same prompts repeatedly during development.
It's a simple Python package that provides a unified interface for OpenAI, Anthropic, Google Gemini, and local models (Ollama), with built-in response caching that can cut API costs by 50-90%.
Key features: - Intelligent caching with TTL and memory limits - Context-aware caching for multi-user apps - Auto-fallback between providers when one fails - Zero configuration - works with environment variables
from llmswap import LLMClient
client = LLMClient(cache_enabled=True)
response = client.query("Explain quantum computing")
# Second identical query returns from cache instantly (free)
The caching is disabled by default for security. When enabled, it's thread-safe and includes context isolation for multi-user applications.Built this from components of a hackathon project. Already at 2.2k downloads on PyPI. Hope it helps others save on API costs during development.
GitHub: https://github.com/sreenathmmenon/llmswap PyPI: https://pypi.org/project/llmswap/
Actually, this package started based on a hackathon project where I was burning the Anthropic API credits for our hackathon project which was RAG (internal documentation) + MCP.
There were question which were getting repeated several times. The 50% + comes from this experience. So, based on this, I was thinking of some of the use cases like this:
Multi-User Support/FAQ Systems: - How do I reset my password? - Reset password steps? - Forgot my password help - Password reset procedure
RAG based: - How to configure VM? - How to deploy? - How to create a network?
Educational/Training Apps Developer Testing scenarios, etc
You're absolutely right that apps with unique queries won't see these benefits - this won't help in - Personalized Content - Real-Time Data - User-Specific Queries - Creative Generation and other scenarios
I think I should clarify this in the docs. Thanks for the great feedback. This is my first opensource package and first conversation in hackernews. Great to interact and learn from all of you
The most popular probably being redis.