Show HN: Llmswap – Python package to reduce LLM API costs by 50-90% with caching

9 sreenathmenon 1 8/10/2025, 4:16:59 PM pypi.org ↗
I built llmswap to solve a problem I kept hitting in hackathons - burning through API credits while testing the same prompts repeatedly during development.

It's a simple Python package that provides a unified interface for OpenAI, Anthropic, Google Gemini, and local models (Ollama), with built-in response caching that can cut API costs by 50-90%.

Key features: - Intelligent caching with TTL and memory limits - Context-aware caching for multi-user apps - Auto-fallback between providers when one fails - Zero configuration - works with environment variables

  from llmswap import LLMClient

  client = LLMClient(cache_enabled=True)
  response = client.query("Explain quantum computing")
  # Second identical query returns from cache instantly (free)
The caching is disabled by default for security. When enabled, it's thread-safe and includes context isolation for multi-user applications.

Built this from components of a hackathon project. Already at 2.2k downloads on PyPI. Hope it helps others save on API costs during development.

GitHub: https://github.com/sreenathmmenon/llmswap PyPI: https://pypi.org/project/llmswap/

Comments (1)

rav · 29m ago
How is it "50-90%" savings? If a given application doesn't repeat its queries, surely there's nothing to save by caching the responses?