Show HN: Llmswap – Python package to reduce LLM API costs by 50-90% with caching

9 sreenathmenon 1 8/10/2025, 4:16:59 PM pypi.org ↗

I built llmswap to solve a problem I kept hitting in hackathons - burning through API credits while testing the same prompts repeatedly during development.

It's a simple Python package that provides a unified interface for OpenAI, Anthropic, Google Gemini, and local models (Ollama), with built-in response caching that can cut API costs by 50-90%.

Key features: - Intelligent caching with TTL and memory limits - Context-aware caching for multi-user apps - Auto-fallback between providers when one fails - Zero configuration - works with environment variables

  from llmswap import LLMClient

  client = LLMClient(cache_enabled=True)
  response = client.query("Explain quantum computing")
  # Second identical query returns from cache instantly (free)

The caching is disabled by default for security. When enabled, it's thread-safe and includes context isolation for multi-user applications.

Built this from components of a hackathon project. Already at 2.2k downloads on PyPI. Hope it helps others save on API costs during development.

GitHub: https://github.com/sreenathmmenon/llmswap PyPI: https://pypi.org/project/llmswap/

Comments (1)

rav · 29m ago

How is it "50-90%" savings? If a given application doesn't repeat its queries, surely there's nothing to save by caching the responses?

Show HN: Engineering.fyi – Search across tech engineering blogs in one place (engineering.fyi)

Show HN: Bolt – A super-fast, statically-typed scripting language written in C (github.com)

Show HN: Llmswap – Python package to reduce LLM API costs by 50-90% with caching (pypi.org)

Show HN: The current sky at your approximate location, as a CSS gradient (sky.dlazaro.ca)

Show HN: I collected 70k online communities – semantic search to find your niche (pluggo.ai)

Show HN: A new alternative to Softmax attention – live GD-Attention demos (zenodo.org)

Show HN: I indexed 100k e-commerce stores to build an API (searchagora.com)

Show HN: QRCodes Shaped with Text/Images – NitroQR (nitroqr.com)

Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model (github.com)

Show HN: A reading to remind us to keep raising our voices against oppression (childrensbookforall.org)

Show HN: Trayce – Burp Suite for developers (trayce.dev)

Show HN: Browser AI agent platform designed for reliability (github.com)

Show HN: AI Coloring Pages Generator (aicoloringpages.app)

Show HN: Stasher – Burn-after-read secrets from the CLI, no server, no trust (github.com)

Show HN: Octofriend, a cute coding agent that can swap between GPT-5 and Claude (github.com)

Show HN: Synchrotron, a real-time DSP engine in pure Python (synchrotron.thatother.dev)

Show HN: I spent 6 years building a ridiculous wooden pixel display (benholmen.com)

Show HN: Sinkzone DNS – Forwarder that blocks everything except your allowlist (github.com)

Show HN: Aha Domain Search (ahadomainsearch.com)

Show HN: I've been building an ERP for manufacturing for the last 3 years (github.com)

Show HN: Whittle – A shrinking word game (playwhittle.com)

Show HN: Aura – Like robots.txt, but for AI actions (github.com)

Show HN: An open-source e-book reader for conversational reading with an LLM (github.com)

Show HN: I built a platform to connect with future peers before you start (findeaze.com)

Show HN: Runtime – skills-based browser automation that uses fewer tokens (github.com)

Show HN: I made a Ruby on Rails-like framework in PHP (Still in progress) (github.com)

Show HN: Sidequest.js – Background jobs for Node.js using your database (docs.sidequestjs.com)

Show HN: Tiny logic and number games I built for my kids (quizmathgenius.com)

Show HN: When is the next Caltrain? (minimal webapp) (erikschluntz.com)

Show HN: Tiered storage and fast SQL for InfluxDB 1.x/2.x (historian.exydata.com)

Show HN: Kimu – Open-Source Video Editor (trykimu.com)

Show HN: Event Sourcing and AI (eventsourcing.ai)

Show HN: AI feedback on system design diagrams (system-design-6m8.pages.dev)

Show HN: Keywords for Self-Talk (plastithink.com)

Show HN: HMPL – Small Template Language for Rendering UI from Server to Client (github.com)

Show HN: GPT-5 Document Retrieval – AI Assistant with Inline Citations (smartresearch-ai.com)

Show HN: LLM from URL –– A free AI chat completion service directly from URL (818233.xyz)

Show HN: FFlags – Feature flags as code, served from the edge (fflags.com)

Show HN: Stagewise (YC S25) – Front end coding agent for existing codebases (github.com)

Show HN: I made a safe anonymous message app (subrosa.vercel.app)

Show HN: I made FiscalBud to send invoices fast and worldwide in 77 languages (apps.iridesk.com)

Show HN: Selfhostllm.org – Plan GPU capacity for self-hosting LLMs (selfhostllm.org)

Show HN: BaaS to build agents as data, not code (github.com)

Show HN: I built a tool to replace capcut audio transcription (meetcosmos.com)

Show HN: Aegis – A framework for AI-governed software development (github.com)

Show HN: Spellbook, a system package manager written in Elixir (spell-book.run)

Show HN: Bringing Tech News from HN to My Community (sh4jid.me)

Show HN: MemU: Let AI Memorize You (github.com)

Show HN: I built a (legit) AI mortgage document analyzer that saves you money (oxford.loan-estimate-analysis.morfi.com)

Show HN: I built a tool that lets you summon AI in any app or website (useinset.com)

Show HN: Llmswap – Python package to reduce LLM API costs by 50-90% with caching

Comments (1)