Fight Chat Control (fightchatcontrol.eu)

I built llmswap to solve a problem I kept hitting in hackathons - burning through API credits while testing the same prompts repeatedly during development.

It's a simple Python package that provides a unified interface for OpenAI, Anthropic, Google Gemini, and local models (Ollama), with built-in response caching that can cut API costs by 50-90%.

Key features: - Intelligent caching with TTL and memory limits - Context-aware caching for multi-user apps - Auto-fallback between providers when one fails - Zero configuration - works with environment variables

  from llmswap import LLMClient

  client = LLMClient(cache_enabled=True)
  response = client.query("Explain quantum computing")
  # Second identical query returns from cache instantly (free)

The caching is disabled by default for security. When enabled, it's thread-safe and includes context isolation for multi-user applications.

Built this from components of a hackathon project. Already at 2.2k downloads on PyPI. Hope it helps others save on API costs during development.

GitHub: https://github.com/sreenathmmenon/llmswap PyPI: https://pypi.org/project/llmswap/

Comments (3)

rav · 2h ago

How is it "50-90%" savings? If a given application doesn't repeat its queries, surely there's nothing to save by caching the responses?

wasabi991011 · 2h ago

How does this compare to decorating with @functions.cache?

0points · 2h ago

I hate to be that guy, but your AI should have suggested you used one of the off-the-shelf in-memory key-value databases.

The most popular probably being redis.

Fight Chat Control (fightchatcontrol.eu)

GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2 (magazine.sebastianraschka.com)

Try and (ygdp.yale.edu)

Show HN: Bolt – A super-fast, statically-typed scripting language written in C (github.com)

Diffusion language models are super data learners (jinjieni.notion.site)

Show HN: Engineering.fyi – Search across tech engineering blogs in one place (engineering.fyi)

Creating the Longest Possible Ski Jump in "The Games: Winter Challenge" (mrwint.github.io)

Booting 5000 Erlangs on Ampere One 192-core (underjord.io)

Writing simple tab-completions for Bash and Zsh (mill-build.org)

MCP: An (Accidentally) Universal Plugin System (worksonmymachine.ai)

Inside OS/2 (1987) (gitpi.us)

Type (YC W23) is hiring a founding engineer to build an AI-native doc editor (ycombinator.com)

Flintlock – Create and manage the lifecycle of MicroVMs, backed by containerd (github.com)

Abogen – Generate audiobooks from EPUBs, PDFs and text (github.com)

How I code with AI on a budget/free (wuu73.org)

My Dream Productivity Device Is Done – and It's Becoming a Kit [video] (youtube.com)

Open Lovable (github.com)

Abusing Entra OAuth for fun and access to internal Microsoft applications (research.eye.security)

Curious about the training data of OpenAI's new GPT-OSS models? I was too (twitter.com)

Zig's Lovely Syntax (matklad.github.io)

The Framework Desktop is a beast (world.hey.com)

Sunlight-activated material turns PFAS in water into harmless fluoride (phys.org)

POML: Prompt Orchestration Markup Language (github.com)

LLMs Aren't World Models (yosefk.com)

How Potatoes Evolved (nhm.ac.uk)

Melonking Website (melonking.net)

The current state of LLM-driven development (blog.tolki.dev)

LHC's New Chip Tackles Radiation Challenges (spectrum.ieee.org)

A CT scanner reveals surprises inside the 386 processor's ceramic package (righto.com)

“The Hollow Men” at 100 (prufrock.substack.com)

Ch.at – A lightweight LLM chat service accessible through HTTP, SSH, DNS and API (ch.at)

Tractor Beams Today (expmag.com)

Don't “let it crash”, let it heal (zachdaniel.dev)

Debian 13 “Trixie” (debian.org)

People returned to live in Pompeii's ruins, archaeologists say (bbc.com)

I want everything local – Building my offline AI workspace (instavm.io)

An engineer's perspective on hiring (jyn.dev)

Show HN: The current sky at your approximate location, as a CSS gradient (sky.dlazaro.ca)

Who got arrested in the raid on the XSS crime forum? (krebsonsecurity.com)

ESP32 Bus Pirate 0.5 – A hardware hacking tool that speaks every protocol (github.com)

An AI-first program synthesis framework built around a new programming language (queue.acm.org)

My Lethal Trifecta talk at the Bay Area AI Security Meetup (simonwillison.net)

OpenFreeMap survived 100k requests per second (blog.hyperknot.com)

A Simple CPU on the Game of Life (2021) (nicholas.carlini.com)

MCP overlooks hard-won lessons from distributed systems (julsimon.medium.com)

How I use Tailscale (chameth.com)

GPTs and Feeling Left Behind (whynothugo.nl)

P-fast trie, but smaller (dotat.at)

R0ML's Ratio (blog.glyph.im)

Creating high quality electronics schematics (blog.poly.nomial.co.uk)

Show HN: Llmswap – Python package to reduce LLM API costs by 50-90% with caching

Comments (3)