LLM Leaderboard-Comparing 100 AI Models from OpenAI, Google, DeepSeek and Others (artificialanalysis.ai)

  const agentGuard = require('agent-guard');
  await agentGuard.init({ limit: 50 }); // $50 budget

  // Your existing code runs unchanged
  const response = await openai.chat.completions.create({...});
  // AgentGuard tracks costs automatically

When your code hits $50 in API costs, AgentGuard stops execution and shows you exactly what happened.

Why I built this:

I got tired of seeing "I accidentally spent $500 on OpenAI" posts. Existing tools like tokencost help you measure costs after the fact, but nothing prevents runaway spending in real-time.

AgentGuard is essentially a circuit breaker for AI API costs. It's saved me from several costly bugs during development.

Limitations: Only works with OpenAI and Anthropic APIs currently. Cost calculations are estimates based on documented pricing.

Source: https://github.com/dipampaul17/AgentGuard

Install: npm i agent-guard

Comments (26)

yifanl · 20h ago

So this is essentially monkey-patching every variation of fetch/library fetch and doing math on the reported token counts?

It's an... intrusive solution. Glad to hear it works for you though.

RedShift1 · 16h ago

Just be glad it's not AI based... :')

samsk · 20h ago

For this I use LiteLLM proxy - you can create virtual keys with daily/weekly/... budget, its pretty flexible and has a nice UI.

See https://docs.litellm.ai/docs/proxy/users

oc1 · 15h ago

But do you use it locally? It seems to be more of a server-side product

therealpygon · 13h ago

Personally, I absolutely use it locally. I’m always trying different editors and tech. It saves me from entering a multitude of API keys into each different software, in addition to the other reasons supplied for being able to specify your own limits and avoid surprise charges if you want.

When I want to try a new editor, vs code plugin or software, I only have to point it at my litellm proxy and immediately have access to all of my providers and models I’ve configured, no extra setup. It’s like a locally hosted openrouter that doesn’t charge you for routing. I can just select a different provider as easy as choosing the model in the software; switching from “openai/gpt-4o” to “groq/moonshotai/kimi-k2-instruct”, for example.

You can use litellm or OpenAI protocols which makes it compatible with most software. Add on ollama proxy and you can proxy ollama requests from software that doesn’t support specifying OpenAI’s base address but that does support ollama (a not uncommon situation). That combo covers most software.

So yes, to me it is absolutely worth running locally and as easy as editing a config file and starting a docker (or a shell script to open a venv and start litellm, if you prefer).

The only drawbacks I’ve found so far is that not all providers accurately respond with their model information so you sometimes have to configure models/pricing/limits manually in the config (about 5 lines of text that be copy/pasted and edited). All the SOTA models are pre-configured and kept relatively up to date, but one can expect updates to lag behind real pricing changes.

The UI is necessary if you want to set up api key/billing restrictions which requires a db, but that is rather trivial with docker as well.

samsk · 15h ago

Its a server-side proxy, so instead of ie. OpenAI url you point your AI tool to url of LiteLLM proxy, and use its virtual keys with budget limits or LLM models restrictions etc... - the features LLM providers will not give you, because it might save you money ;)

throwaway_ocr · 17h ago

Wouldn't the obvious solution to this problem to stop using agents that don't respect your usage limits instead of trying to build sketchy containers around misbehaving software?

stingraycharles · 16h ago

Yeah I don’t understand this problem, who uses so many agents at the same time in the first place?

And if this is really a problem, why not funnel your AI agents through a proxy server which they all support instead of this hacky approach? It would be super easy to build a proxy server that keeps track of costs per day/session and just returns errors once you hit a limit.

sothatsit · 14h ago

In fact, proxy servers like LiteLLM[1] already exist that are very easy to setup and set limits in.

[1] LiteLLM: https://www.litellm.ai/

hansmayer · 12h ago

"Commit 2ef776f dipampaul17 committed Jul 31, 2025 · Update READMEs: honest, clear, aesthetic - Removed pretentious language and marketing speak - Added real developer experience based on actual testing - Clear, direct explanations of what it actually does - Aesthetic improvements with better formatting - Accurate feature descriptions based on verified functionality - Honest about capabilities without overselling - Reflects the 30-second integration we tested

The README now matches what developers actually experience: two lines of code, automatic tracking, no code changes needed."

Hey OP - next time perhaps at least write the commit messages yourself?

jeffhuys · 17h ago

Honestly feels very vibe-coded [1] [2] and would not really trust my money with something like this. I had to read the code to understand what it actually protects me from, as the README.md (other than telling me it's production-ready, professional, and protects me from so much!) tells me "Supports all major providers: OpenAI, Anthropic, auto-detected from URLs". OpenAI and Anthropic are "all" major providers [3]?

[1] https://github.com/dipampaul17/AgentGuard/blob/51395c36809aa...

[2] https://github.com/dipampaul17/AgentGuard/commit/d49b361d7f3...

[3] https://github.com/dipampaul17/AgentGuard/blob/083ae9896459b...

diggan · 15h ago

> [1] https://github.com/dipampaul17/AgentGuard/commit/d49b361d7f3...

It's kind of crazy that people use these multi-billion parameter machine learning models to do search/replace of words in text files, rather than the search/replace in their code editor. I wonder what the efficiency difference is, must be 1000x or even 10000x difference?

Don't get me wrong, I use LLMs too, but mostly for things I wouldn't be able to do myself (like isolated math-heavy functions I can't bother to understand the internals of), not for trivial things like changing "test" to "step" across five files.

I love that the commit ends with

> Codebase is now enterprise-ready with professional language throughout

Like "enterprise-ready" is about error messages and using "Examples" instead of "Demo".

throwawayoldie · 13h ago

This is the future of the field: amateur night that never ends. And this is why I'm looking for a new career at my advanced age.

jeffhuys · 13h ago

If you have any tips, feel free to enlighten me. Even though I'm "only" in my 30s I feel the future is uncertain - the original author of this post made one other post that was also clearly vibe-coded, but not many comments seem to point it out. It'll only get worse from here, depending how you look at it of course; hackers will have a WAY easier time as time goes on.

throwawayoldie · 13h ago

Still figuring it out, but if I do, I promise I'll circle back here and let you know. So far the best idea I have is to stay in software and ideally my current job for the moment, but phone it in, while I retrain as something else at night.

johnisgood · 13h ago

I use LLMs, too, and that did give me a chuckle.

eqvinox · 16h ago

It's incredible how emoji overuse has become a giant red flag for AI over-/abuse.

jeffhuys · 14h ago

Well, that, and stuff like the commit descriptions.

> Polish README: Remove downloads badge and clean up styling

> - Removed downloads badge ___as requested___

StevenWaterman · 16h ago

The AI's idea of developing a startup is eerily reminiscent of a hacking scene in CSI

delusional · 14h ago

> Close first customers at $99/month

> The foundation is bulletproof. Time to execute the 24-hour revenue sprint.

Comedy gold. This is one of those times where i cant figure out if the author is in on the joke, or if they're actually so deluded that they think this doesn't make them look idiotic. If it's the latter, we need to bring bullying back.

Either way it's hilarious.

jeffhuys · 13h ago

Well the author has many, many repositories like this.

It seems like he's still stuck in the "If I just say to my AI that I want a production-ready package that people will pay me $99/month for, I'll get it eventually, right?" phase of discovering LLMs.

The end-result is many commits saying "fixed all issues, enterprise-ready now as requested!" adding 500 lines of code causing more issues.

The funniest part, to me, is that this only damages his image, instead of solidifying it. We've had so many applicants at my company recently where we go to their github, and they have 10 repositories all obviously vibe-coded together, acting like they made some amazing stuff. Instant deletion of application, no coming back from that - this person would NOT get a job here.

almost · 20h ago

So it monkey-patches a set of common http libraries and then detects calls to AI APIs? Not obvious which APIs it would detect or in what situations it would miss them. Seems kind of dangeorus to rely on something like that. You install it and it might be doing nothing, You only find out after somethings gone wrong.

If I was using something like this I think I'd rather have it wrap the AI API clients. Then it can throw an error if it doesn't recongise the client library I'm using. This way it'll just silently fail to monitor if what I'm using isn't in its supported list (whatever that is!)

I do think the idea is good though, just needs to be obvious how it will work when used and how/when it will fail.

can16358p · 18h ago

I really wonder how much $$$ was burned while testing this against production.

diggan · 15h ago

I guess about $0.01 per test run, and maybe you run it 100 times (if even) so about $1?

bwfan123 · 12h ago

Expand it to "enterprise security solutions for agent deployments" and you will get VC funding for it. For any new technology, the playbook is create startups that do: "compliance", "governance", "security", "observability" around that technology. So, the big security companies can acquire said startup and add it as a feature to their existing products.

While you are at it, use the term "guardrails" as that is quite fashionable.

atemerev · 18h ago

So that's how AGI will escape containment.

Show HN: Dingo 1.9.0 released: With enhanced hallucination detection (github.com)

Are Cyber Defenders Winning? – Lawfare (lawfaremedia.org)

The Impossible Quiz (en.wikipedia.org)

LLM Leaderboard-Comparing 100 AI Models from OpenAI, Google, DeepSeek and Others (artificialanalysis.ai)

IceBear: A Fine-Grained Incremental Scheduler for C/C++ Static Analyzers (doi.org)

Has any YC founder ever gone to jail for startup-related crimes?

Improvised Star Trek: Podcast and Stage Show (2019) (theimprovisedstartrek.com)

Cloud Optimized GeoTIFF: Imagery format for cloud-native geospatial processing (cogeo.org)

Waymo car crashed into another Waymo car (twitter.com)

China's summons Nvidia to explain H20 chip's 'back-door' risks (scmp.com)

Show HN: SonicInfra, Cloud at On-Prem Prices (sonicinfra.com)

Show HN: KubeForge – A GUI for Kubernetes YAMLs (github.com)

Show HN: I vibe coded a word game as a game designer (no ads, just fun) (wordsdescrambler.com)

Anyone else have Claude accounts banned for ToS breach with no warning? (reddit.com)

Shape memory alloys for cryogenic actuators (nature.com)

Rove Founders Arhan Chhabra and Max Morganroth (thefinancefrontier.substack.com)

Why Gen Zers Are Ditching Instagram for BeReal and Snapchat (thefinancefrontier.substack.com)

TraceRoot: Find the Root Cause in Your Code's Trace (github.com)

Hamas Wants Gaza to Starve (theatlantic.com)

Visual AI x RAG – Search, tag and edit photos with AI (coreviz.io)

A rtt log view tool (makerinchina.cn)

Wide Research (manus.im)

TikTok made songs shrink, but artists are pushing back (bbc.co.uk)

Can you tell if that song AI-generated? (apnews.com)

Figma's stock soared in its highly anticipated IPO,market cap instantly hit $45B (techcrunch.com)

Google loses on appeal In re: Google Play Store Antitrust LItigation [pdf] (cdn.ca9.uscourts.gov)

Britain Is Losing Its Free Speech, and America Could Be Next (currentaffairs.org)

Fair Access to Banking (usips.org)

Why Does the Universe Exist? CERN Just Found a Clue (nature.com)

What do you get out of the StackOverflow Dev Survey Results? (survey.stackoverflow.co)

Thinking Is Becoming a Luxury Good (nytimes.com)

Show HN: A JSON MCP that doesn't clutter your LLM context (github.com)

Why do right-wing figures name their companies after Lord of the Rings? (crikey.com.au)

Samsung smart TV outage prevents access to third party apps (downdetector.com)

We Asked 100 AI Models to Write Code (veracode.com)

Building an AI-native multi-model UI with SurrealDB and Rust (surrealdb.com)

MemoCall – I built an AI that listens to service calls and fills work orders (memocall.ai)

Why "vibe physics" is the ultimate example of AI slop (bigthink.com)

Show HN: Merge Images – A Minimal Tool to Join Multiple Images into One (mergeimages.co)

Show HN: Change Image Aspect Ratio Tool (changeimageaspectratio.com)

HTML Day 2025 (html.energy)

Verification debt is the AI era's technical debt (kevinbrowne.ca)

As AI changes how movies are made, Hollywood crews ask: What's left for us? (latimes.com)

Rhino horns made radioactive to foil traffickers in South African project (theguardian.com)

Teaching Open Source in North Korea (izbicki.me)

Scientists Are Learning to Rewrite the Code of Life (nytimes.com)

Lunar soil could support life on the Moon, say scientists (phys.org)

Internet Censorship and Control [video] (youtube.com)

The Potato's Mysterious Family Tree Revealed–and It Includes Tomatoes (scientificamerican.com)

Doom and Wolf RPG port available on F-Droid (f-droid.org)

Show HN: AgentGuard – Auto-kill AI agents before they burn through your budget

Comments (26)