Show HN: Price Per Token – LLM API Pricing Data
262 alexellman 115 7/25/2025, 12:39:41 PM pricepertoken.com ↗
The LLM providers are constantly adding new models and updating their API prices. Anyone building AI applications knows that these prices are very important to their bottom line. The only place I am aware of is going to these provider's individual website pages to check the price per token.
To solve this inconvenience I spent a few hours making pricepertoken.com which has the latest model's up-to-date prices all in one place.
Thinking about adding image models too especially since you have multiple options (fal, replicate) to use the same model and the prices are not always the same.
We have solved this problem by working with the providers to implement a prices and models API that we scrape, which is how we keep our marketplace up to date. It's been a journey; a year ago it was all happening through conversations in shared Slack channels!
The pricing landscape has become more complex as providers have introduced e.g. different prices for tokens depending on prompt length, caching, etc.
I do believe the right lens on this is actually the price per token by endpoint, not by model; there are fast/slow versions, thinking/non-thinking, etc. that can sometimes also vary by price.
The point of this comment is not to self promote, but we have put a huge amount of work into figuring all of this out, and have it all publicly available on OpenRouter (admittedly not in such a compact, pricing-focused format though!)
https://github.com/tekacs/llm-pricing
[1] https://ai.google.dev/gemini-api/docs/pricing#gemini-2.5-fla...
edit: my bad I was wrong shouldnt have responded like this
Your website reports 0.30$ for input and that wouldn't make any sense as it would be priced the same as the bigger Flash model.
Put a really really bad taste in my mouth.
- for other models there are providers that serve the same model with different prices
- each provider optimizes for different parameters: speed, cost, etc.
- the same model can still be different quantizations
- some providers offer batch pricing (e.g., Grok API does not)
And there are plenty of other parameters to filter over- thinking vs. non-thinking, multi-modal or not, etc. not to even mention benchmarks ranking.
https://artificialanalysis.ai gives a blended cost number which helps with sorting a bit, but a blended cost model for input/output costs are going to change depending on what you're doing.
I'm still holding my breath for a site that has a really nice comparison UI.
Someone please build it!
We have a simple model comparison tool that is not-at-all-obvious to find on the website, but hopefully can help somewhat. E.g.
https://openrouter.ai/compare/qwen/qwen3-coder/moonshotai/ki...
- An image will take 10x token on gpt-4o-mini vs gpt-4.
- On gemini 2.5 pro output token are token except if you are using structure output, then all character are count as a token each for billing.
- ...
Having the price per token is nice, but what is really needed is to know how much a given query / answer will cost you, as not all token are equals.
Can you elaborate this? I don’t quite understand the difference.
Are there any tutorials you can recommend for somebody interested in getting something running locally?
You can, almost, convert the number of nodes to gb of memory needed. For example, Deepseek-r1:7b needs about 7gb of memory to run locally.
Context window matters, the more context you need, the more memory you'll need.
If you are looking for AI devices at $2500, you'll probably want something like this [1]. A unified memory architecture (which will mean LPDDR5) will give you the most memory for the least amount of money to play with AI models.
[1] https://frame.work/products/desktop-diy-amd-aimax300/configu...
When local models don’t cut it, I like Gemini 2.5 flash/pro and gemini-cli.
There are a lot of good options for commercial APIs and for running local models. I suggest choosing a good local and a good commercial API, and spend more time building things than frequently trying to evaluate all the options.
It's been a while since I checked out Mini prices. Today, $2400 buys an M4 Pro with all the cores, 64GB RAM, and 1TB storage. That's pleasantly surprising...
there are no out the box solutions to run a fleet of models simultaneously or containerized either
so the closed source solutions in the cloud are light years ahead and its been this way for 15 months now, no signs of stopping
would probably need multiple models running in distinct containers, with another process coordinating them
Openrouter is a good alternative. Added bonus that you can also see where the open models come in, and can make an educated guess on the true cost / size of a model, and how likely it is it's currently subsidised.
A limitation though, at least the last time I checked, is that you only get a single provider returned per model. That’s fine for the major commercial models that all have the same pricing on each provider, but makes it hard to rely on for open source models, which tend to have many providers offering them at different price points (sometimes very different price points—like 5x or 10x difference).
- Off-peak pricing by DeepSeek
- Batch pricing by OpenAI and Anthropic
- Context window differentiated pricing by Google and Grok
- Thinking vs non-thinking token pricing by Qwen
- Input token tiered pricing by Qwen coder
I originally posted here: https://x.com/paradite_/status/1947932450212221427
https://github.com/tekacs/llm-pricing
--- ---[1] https://llm-prices.com/
[0] https://developers.googleblog.com/en/gemini-2-5-models-now-s...
We have uploaded mostly the openrouter api models, but trying to do it in a useful way to personalize calculation and comparison. If someone would like to test or have a demo, we will be glad for any feedback.
The UI, however, is really clean and straight to the point. I like the interface, but miss the content
Mistral, Llama, Kimi, Qwen…?
Not a site if value unless it contains whole of the market.
Here's the current version:
I would love to replace this with an API call.1. Pull some large tech company’s open source’ tools’ JS file 2. Extract an internal JSON blob that contains otherwise difficult information 3. Parse it and use what I need from within it for my tool
What tools / experiments out there exist to exercise these cheaper models to output more tokens / use more CoT tokens to achieve the quality of more expensive models?
eg, Gemini 2.5 flash / pro ratio is 1 1/3 for input, 1/8 for output... Surely there's a way to ask Flash to critique it's work more thoroughly to get to Pro level performance and still save money?
Just now I was testing the new Qwen3-thinking model. I've run the same prompt five times. The costs I got, sorted: 0.0143, 0.0288, 0.0321, 0.0389, 0.048 . And this is for single model.
Also, in my experience, sonnet-4 is cheaper than gemini-2.5-pro, despite token costs being higher.
- Filter by model "power" or price class. I want compare the mini models, the medium models, etc.
- I'd like to see a "blended" cost which does 80% input + 20% output, so I can quickly compare the overall cost.
Great work on this!
Should I use copilot pro in agent mode with sonnet 4, or is it cheaper to use claude with sonnet 4 directly?
You should also aggregate prices from Vertex and AWS Bedrock .
I am newly to this hobby, but would like to know more about what experienced person things and do.
https://claude.ai/share/20b36bd3-d817-4228-bc33-aa7c4910bc2b (the preview seems to only work in Chrome, for Firefox you have to download the html).
Plus maybe half an hour to verify and correct the prices and another few minutes for domain and hosting.
The author posted it himself, so why not spend an hour or two more and have a decent list with at least half a dozen providers and 100 models? In its current state it is just a mockup.
It is since 3 hours on the top of the front page, if the author had added one per minute to the .json, then it would now be 200 models.
I plan on adding cache prices and making the list more comprehensive
No comments yet