Claude Code weekly rate limits
Next month, we're introducing new weekly rate limits for Claude subscribers, affecting less than 5% of users based on current usage patterns.
Claude Code, especially as part of our subscription bundle, has seen unprecedented growth. At the same time, we’ve identified policy violations like account sharing and reselling access—and advanced usage patterns like running Claude 24/7 in the background—that are impacting system capacity for all. Our new rate limits address these issues and provide a more equitable experience for all users.
What’s changing: Starting August 28, we're introducing weekly usage limits alongside our existing 5-hour limits: Current: Usage limit that resets every 5 hours (no change) New: Overall weekly limit that resets every 7 days New: Claude Opus 4 weekly limit that resets every 7 days As we learn more about how developers use Claude Code, we may adjust usage limits to better serve our community. What this means for you: Most users won't notice any difference. The weekly limits are designed to support typical daily use across your projects. Most Max 5x users can expect 140-280 hours of Sonnet 4 and 15-35 hours of Opus 4 within their weekly rate limits. Heavy Opus users with large codebases or those running multiple Claude Code instances in parallel will hit their limits sooner. You can manage or cancel your subscription anytime in Settings. We take these decisions seriously. We're committed to supporting long-running use cases through other options in the future, but until then, weekly limits will help us maintain reliable service for everyone.
We also recognize that during this same period, users have encountered several reliability and performance issues. We've been working to fix these as quickly as possible, and will continue addressing any remaining issues over the coming days and weeks.
–The Anthropic Team
I feel like someone is going to reply that I'm too reliant on Claude or something. Maybe that's true, but I'd feel the same about the prospect of loosing ripgrep for a week, or whatever. Loosing it for a couple of days is more palatable.
Also, I find it notable they said this will affect "less than 5% of users". I'm used to these types of announcements claiming they'll affect less than 1%. Anthropic is saying that one out of every 20 users will hit the new limit.
*edited to change “pro” to “plus”
As a pedantic note, I would say 'ration'. Things you hoard don't magically go away after some period of time.
Rationed/hoarded do imply, to me, something different about how the quantity came to be though. Rationed being given or setting aside a fixed amount, hoarded being that you stockpiled/amassed it. Saying "you hoarded your rations" (whether they will expire) does feel more on the money than "you ration your rations" from that perspective.
I hope this doesn't come off too "well aktually", I've just been thinking about how I still realize different meanings/origins of common words later in life and the odd things that trigger me to think about it differently for the first time. A recent one for me was that "whoever" has the (fairly obvious) etymology of who+ever https://www.etymonline.com/word/whoever vs something like balloon, which has a comparatively more complex history https://www.etymonline.com/word/balloon
I just love this community for these silly things.
Rationing suggests a deliberate, calculated plan: we’ll eat this much at these particular times so our food lasts that long. Hoard seems more ad hoc and fear-driven: better keep yet another beat-up VGA cable, just in case.
Counterexample: animals hoarding food for winter time, etc.
One could theoretically ration their rations out further... but that would require knowing the usage to the point to set the remaining fixed amounts - which is precisely whT's missing in the interface.
So, back to hoarding.
One day a few of hours of prompting is fine, another you'll hit your weekly limit and you're out for seven days.
While still paying your subscription.
I can't think of any other product or service which operates on this basis - where you're charged a set fee, but the access you get varies from hour to hour entirely at the provider's whim. And if you hit a limit which is a moving target you can't even check you're locked out of the service.
It's ridiculous. Begging for a law suit, tbh.
What they could do is pay as you go, with pricing increasing with the demand (Uber style), but I don't think people would like that much.
Decided to give PRO a try when I kept getting terrible results from the $20 option.
So far it's perhaps 20% improved in complex code generation.
It still has the extremely annoying ~350 line limit in its output.
It still IGNORES EXPLICIT CONTINUOUS INSTRUCTIONS eg: do not remove existing comments.
The opaque overriding rules that - despite it begging forgiveness when it ignores instructions - are extremely frustrating!!
Often they're better at recognizing failures to stick to the rules and fixing the problems than they are at consistently following the rules in a single shot.
This does mean that often having an LLM agents so a thing works but is slower than just doing it myself. Still, I can sometimes kick off a workflow before joining a meeting, so maybe the hours I've spent playing with these tools will eventually pay for themselves in improved future productivity.
But at things I have no idea about like medicine it feels very convincing. Am I in hazard?
People don’t understand Dunning-Kruger. People are prone to biases and fallacies. Likely all LLMs are inept at objectivity.
My instructions to LLMs are always strictness, no false claims, Bayesian likelihoods on every claim. Some modes ignore the instructions voluntarily, while others stick strictly to them. In the end it doesn’t matter when they insist on 99% confidence on refuted fantasies.
Reality is probably that there’s a backlog item to implement a view, but it’s hard to prioritize over core features.
Back to the conspiracy ^^
It's even harder to prioritize when the feature you pay to develop probably costs you money.
I have zero doubt that this is working exactly as intended. We will keep all our users at 80% of what we sold them by keeping them anxious about how close they are to the limit.
Hover on it on a desktop, it’ll show how many requests you have left.
This isn't like a gym membership where people join aspirationally. No one's new year's resolution is "I'm going to use o3 more often."
Unless you use "free" GPT 4.1 like MS wants you (not the same as Claude, even with Beast Mode). And how long is that going to be free, because it feels like a design to simply push you to a MS product (MS>OpenAI) instead of third party.
So what happens a year from now? Paid GPT 5.1? With 4.1 being removed? If it was not for the insane prices of actual large mem GPUs and the slowness of large models, i will be using LLMs at home. Right now MS/Antropic/OpenAI are right in that zone where its not too expensive yet to go full local LLM.
The human brain is stupid and remarkably exploitable. Just a teensy little bit of information hiding can illicit strange and self-destructive behavior from people.
You aren't cut off until you're cut off, then it's over completely. That's scary, because there's no recourse. So people are going to try to avoid that as much as possible. Since they don't know how much they're using, they're naturally going to err on the side of caution - paying for more than they need.
The dark pattern isn’t the usage limit. It’s the lack of information about current and remaining usage.
If I sit down for dinner at an all-you-can-eat buffet, I get to decide how much I’m having for dinner. I don’t mind if they don’t let me take leftovers, as it is already understood that they mean as much as I can eat in one sitting.
If they don’t want folks to take advantage of an advertised offer, then they should change their sales pitch. It’s explicitly not gaming any system to use what you’re paying for in full. That’s your right and privilege as that’s the bill of goods you bought and were sold.
I also find it hard to believe 5% of customers are doing that, though.
As I said, I have trouble believing this constitutes 5% of users, but it constitutes something and yeah, I feel Anthropic is justified in putting a cap on that.
I also wouldn't consider my usage extreme. I never use more than one instance, don't run overnight, etc.
For example, I don’t mind that Netflix pauses playback after playing continuously for a few episodes of a show, because the options they present me with acknowledge different use cases. The options are: stop playing, play now and ask me again later, and play now and don’t ask me again. These options are kind to the user because they don’t disable the power user option.
I haven't yet run into this limit...
Do I read this correctly? Only 100 messages per week, on the pro plan worth a few hundred buck a month?!
Per their website: https://help.openai.com/en/articles/9793128-what-is-chatgpt-...
There are no usage caps on pro users (subject to some common sense terms of use).
I have a pro plan and I hammer o3–I’d guess more than a hundred a day sometimes—and have never run into limits personally
Wouldn’t shock me if something like that happened but haven’t seen evidence of it yet
Apple is your business partner, doing marketing and distribution for you, and shares its user base. Bloomberg terminals provide real time data and UI to non-technical finance people. Github provides you Git hosting service so you don't need to setup and maintain servers. MATLAB (although there are Octave, Python and open alternatives) sells numerical computation environment to non-CS engineers. Xilinx is sells its hardware and dev tools. Game devs use Unity because they want to focus on gameplay and not game engine development.
These are all the examples of Division of Labor. This time, however, you have to pay for your core competency, because you cannot compete with a good AI coder in the long run. The value you provide diminishes to almost nothing. Yes you can write prompts, but anyone, even a mediocre LLM can write prompts these days. If you need some software, you don't need to hire SW engineers anymore. A handful of vendors dominate the SW development market. Yes, you can switch. But only between the 3 or 4 tech giants. It's an Oligopoly.
If we have FOSS alternatives, at least we can build new services around them and can move on to this new era. We can adapt. Otherwise, we become a human frontend between the client and the AI giants.
But indeed it always struck me that some developpers decided to become Apple developpers and sacrifice 30% of everything they ever produce to Apple.
I would argue that it might a bit different though, because when doing iOS development it's possible that you don't lose you core skill, which is building software, and that you can switch to another platform with relative ease. What I think might happen with LLM is that people will lose the core skill (maybe not for the generation who did do LLM-less development, but some devs might eventually not ever know other ways to work, and will become digital vassals of whatever service managed to kill all others)
In exchange for 500% more paid users
just three possible examples
It will change. There will be FOSS models, once it no longer takes hundreds of millions of dollars to train them.
The 'take me from A to N' is a pretty broad problem that can have many different solutions. Is that comparable?
We can all see this end up in a oligopoly, no?
If you meant goes down for good, then I'm sure it would be annoying for a few weeks for the FOSS ecosystem, just the time to migrate elsewhere, but there is not much GitHub specific we would really miss.
Sure, devs can still work without AI.
But if the developer who uses AI has more output than the one that doesn't, it naturally incentives everyone to leverage AI more and more.
And note that I objected online services, local LLM don't have the same issues.
Think of driving a car. If the shortest path (in term to time of travel) is through traffic jam, and there is a longer path where you can drive must faster, it's very likely that most people will have the feeling to be more efficient with the longer path.
Also the slow down of using LLM might be more subtle and harder to measure. They might happen at code review time, handling more bugs and incident, harder maintainance, recovering your deleted DB ;)...
I can see the impact on my own input both in quantity and quality (LLMs can come up with ideas I would not come up to, and are very useful for tinkering and quickly testing different solutions).
As any tool it is up to the user to make the best out of it and understand the limits.
At this point it is clear that naysayers:
1) either don't understand our job
2) or haven't given AI tools the proper stress testing in different conditions
3) or are luddites being defensive about the "old" world
[1] https://www.antirez.com/news/154
```
The fundamental requirement for the LLM to be used is: don’t use agents or things like editor with integrated coding agents. You want to:
* Always show things to the most able model, the frontier LLM itself.
* Avoid any RAG that will show only part of the code / context to the LLM. This destroys LLMs performance. You must be in control of what the LLM can see when providing a reply.
* Always be part of the loop by moving code by hand from your terminal to the LLM web interface: this guarantees that you follow every process. You are still the coder, but augmented.
```
Not sure about you, but I think this process, which your source seems to present as a prerequisites to use LLM efficiently (and seems good advice to me too, and actually very similar of how I use LLM myself) must be followed by less than 1% of LLM users.
The deciding factor is not speed. It is knowledge. Will I be able to dish out a great compiler in a week? Probably not. But an especially knowledgeable compiler engineer might just do it, for a simple language. Situations like this are the only 10x we have in our profession, if we don't count completely incapable people. The use of AI doesn't make you 1000x. It might make you output an infinite factor of AI slop more, but then you are just pushing the maintenance burden to a later point in time. In total it might make your output completely useless in the long run, making you a 0x dev in the worst case.
So far almost no code I got from LLMs was acceptable to stay as suggested. I found it useful in cases, when I myself didn't know what a typical (!) way is to do things with some framework, but even then often opted for another way, depending on my project's goals and design. Sometimes useful to get you unstuck, but oh boy I wouldn't let it code for me. Then I would have to review so much bad code, it would be very frustrating.
Nothing about using an LLM removes skills and abilities you already had before it.
And yes, the goal might be to only use it for boilerplate or first draft. But that's today, people are lazy, just wait for the you of tomorrow
Just because you state it, it doesn't make it true. I could tell you that taking buses or robotaxis doesn't change a bit your ability to drive.
Funny story: The widespread of Knorr soup stock already made people unable to cook their own stock soup, or even worse, the skill to season their soup from just basic, fresh ingredients.
Source: my mom.
And just as with cooking: most people won't care - and the same goes with LLMs. It can be good enough... Less efficient? Meh - cloud. AI slop image? Meh - cheaper than paying an artist. LLMs to get kids through school? Meh - something something school-of-life.
I look around and see many poorly educated people leaning hard into LLMs. These people are confusing parroting their prompt output as knowledge, especially in the education realm. And while LLMs may not "remove skills and abilities you already had before it" - you damn sure will lose any edge you had over time. It's a slippery slope of trading a honed skill for convenience. And in some cases that may be a worthwhile trade. In others that is a disaster waiting to happen.
Now, maybe that is the future (no more/extremely little human-written code). Maybe that's a good thing in the same way that "x technological advancement means y skill is no longer necessary" - like how the advent of readily-accessible live maps means you don't need to memorize street intersections and directions or whatever. But it is true.
There was research about vibe coding that had similar conclusion. Feels productive but can take longer to review.
the moment you generate code you don't instantly understand you are better off reading the docs and writing it yourself
I regularly hit the the Pro limits 3 times a day using sonnet. If I use claude code & claude its over in about 30 minutes. No multi 24/7 agent whatever, no multiple windows open (except using Claude to write a letter between claude code thoughts).
I highly doubt I am a top 5%er - but wont be shocked if my week ends on a wednessday. I was just starting to use Claude chat more as it is in my subscription but if I can not rely on it to be available for multiple days its functionally useless - I wont even bother.
Can you share what you're doing? I've been experimenting with Claude Code and I feel like I have to be doing a lot with it before I even start seeing the usage warning limits on the $20/month plan.
When I see people claiming they're getting rate limited after 30 minutes on the $100/month plan I have a hard time understanding what they're doing so different.
For what it's worth I don't use it every day, so maybe there's a separate rate that applies to heavy and frequent users?
And I guess it'll go downhill from here. Anthropic, I wish you the best. Claude is a great tool at good value. But if you keep changing the product after my purchase, that's bad value.
You very well might be a top 5%er among people only on the Pro rather than Max plan
Low danger task so I let it do as it pleased - 30 minutes and was maxed out. Could probably have reduced context with a /clear after every file but then I would have to participate.
I usually use Tasks for running tests, code generation, summarizing code flows, and performing web searches on docs and summarizing the necessary parts I need for later operations.
Running them in parallel is nice if you want to document code flows and have each task focus on a higher level grouping, that way each task is hyper focused on its own domain and they all run together so you don’t have to wait as long, for example:
- “Feature A’s configuration” - “Feature A’s access control” - “Feature A’s invoicing”
So, if rate limits are based on an overall token cost, it is likely that one will hit them first if CC reads a few files and writes a lot of text as output (comments/documentation) rather than if it analyzes a large codebase and then makes a few edits in code.
Very good point, I find it unlikely that 1/20 users is account sharing or running 24/7 agentic workflows.
The stat would be more interesting if instead of 1 in 20 users, they said x in y of users with at least one commit per business day, or with at least one coding question per day, or whatever.
I suspect this could be a significantly higher percentage of professional users they plan to throttle. Be careful of defining Pro like Apple does if you market to actual professionals who earn based on using your product. Your DAUs might be a different ratio than you expect.
I imagine there are lots of people like me who have a subscription to be aware of the product and do some very light work, but the "real" users who rely on the tool might be badly affected by this.
I can personally think of a few internally licensed products, announced with huge fan fare, which never get used beyond the demo to a VP.
Well, not the entire week, however much of it is left. You said you probably won't hit it -- if you do, it's very likely to be in the last 36 hours (20% of a week) then, right? And you can pay for API usage anyway if you want.
Just to nitpick: When the limit is a week, going over it does not mean losing access for a week, but for the remaining time which would assuming the limits aren't overly aggressive mean losing access for at most a couple of days (which you say is more palatable).
I wouldn't say you're too reliant, but it's still good to stay sharp by coding manually every once in a while.
(2) I interpret this change as targeting people who are abusing the single Pro account, but using it more like a multi-developer business would maximizing the number of tokens (multiple sessions running 24/7 always hitting the limits). Anthropic has a business interest in pushing those users to use the API (paying per token) or upgrade to the $200/mo subscription.
(3) While I fear they might regularly continue to push the top x% usage tier users into the higher subscription rate, I also realize this is the first adjustment for token rates of Claude Pro since Claude Code became available on that subscription.
(4) If you don’t want to wait for the next unthrottling, you can always switch to the API usage and pay per token until you are unblocked.
the principle: let's protect against outliers without rocking the behavior of the majority, not at this stage of PMF and market discovery
i'd also project out just how much the compute would cost for the outlier cohort - are we talking $5M, $100M, $1B per year? And then what behaviors will simply be missed by putting these caps in now - is it worth missing out on success stories coming from elite and creative users?
I'm sure this debate was held internally but still...
They undercharged for this product to collect usage data to build better coding agents in the future. It was a ploy for data.
Anecdotally, I use Claude Code with the $20/mo subscription. I just use it for personal projects, so I figured $20 was my limit on what I’d be willing to spend to play around with it. I historically hit my limits just a few times, after ~4hrs of usage (resets every 5hrs). They recently updated the system and I hit my limits consistently within an hour or two. I’m guessing this weekly limit will affect me.
I found a CLI tool (which I found in this thread today) that estimates I’m using ~$150/mo in usage if I paid through the API. Obviously this is very different from my payments. If this was a professional tool, maybe I’d pay, but not as a hobbyist.
I’m guessing that they did, and that that’s what this policy is.
If you’re talking about detecting account sharing/reselling, I’m guessing they have some heuristics, but they really don’t want the bad press from falsely accusing people of that stuff.
my point is that 5% still a large cohort and they happen to be your most excited/creative cohort. they might not all want to pay a surchage yet while everyone is discovering the use cases / patterns / etc
having said that, entirely possible burn rate math and urgency requires this approach
The announcement says that using historical data less than 5% of users would even be impacted.
That seems kind of clear: The majority of users will never notice.
that 5% is probably the most creative and excited cohort. obviously it's critical to not make the experience terrible for the 95% core, but i'd hate to lose even a minority of the power users who want to build incredible things on the platform
having said that, the team is elite, sure they are thinking about all angles of this issue
But those power users are often your most creative, most productive, and most likely to generate standout use cases or case studies. Unless they’re outright abusing the system, I’d lean toward designing for them, not against them.
if the concern is genuine abuse, that feels like something you handle with escalation protocols: flag unusual usage, notify users, and apply adaptive caps if needed. Blanket restrictions risk penalizing your most valuable contributors before you’ve even discovered what they might build
that's exactly what they have done - the minority of accounts that consume many standard deviations above the mean of resources will be limited, everyone else will be unaffected.
correct me if I'm wrong, it's not like we have visibility into the token limit logic, even on the 5hr window?
but if you use an agent and it tries to include a 500kb json file, yeah, you will light cash on fire
(this happened to me today but the rate limit bright it to my attention.)
For brief, low context interactions, it is crazy how far your money goes.
Now your vibes can be at the beach.
I'm pretty sure they calibrated it so that only the people who max out every 5 hour window consistently get hit by the weekly quota.
Given that I rarely hit the session limits I’m hopeful I won’t be affected, but the complete and utter lack of transparency is really frustrating.
as many other services did, and even some tangible products are implementing, the introduced limit will later on be used to create more tiers and charge you more for the same without providing anything extra. #shrinkflation
Do we even know the Anthropic financials? My guess is that they're probably losing money on all their tiers.
Sorry, I'll just be "that guy" for a moment. Assuming that access is cut at a random time during the week, the average number of days without Claude would be 3.5. That's not reasonable as it's dependant on usage. So assume that you've always been just shy of hitting the limit, and you increase usage by 50%, then you'd hit the limit 4.67 days in. Just 2-3 hours shy of the weekend - a sort of reward for the week's increased effort.
Have a blessed Thuesday.
I know entire offices in Bangladesh share some of these accounts, so I can see how it is a problem.
If it's affecting 5% of users, it might be people who are really pushing it and might not know (hopefully they get a specialized notice that they may see usage differences).
If I had used an LLM, maybe I wouldn't have misspelled "losing" not once but twice and not noticed until after the edit window. <_<
FYI many input methods (including my own) turn two hyphens into an em dash. The em-dash-means-LLM thing is bogus.
1- “More throughput” on the API, but stealth caps in the UI - On Jun 19 Anthropic told devs the API now supports higher per-minute throughput and larger batch sizes, touting this as proof the underlying infra is scaling. Yay!?? - A week later they roll out weekly hard stops on the $100/$200 “Max” plans — affecting up to 5 % of all users by their own admission.
Those two signals don’t reconcile. If capacity really went up, why the new choke point? I keep getting this odd visceral reaction/anticipation that each time they announce something good, we are gonna get whacked on an existing use case.
2- Sub-agents encourage 24x7 workflows, then get punished… The Sub-agent feature docs literally showcase spawning parallel tasks that run unattended. Now the same behavior is cited as “advanced usage … impacting system capacity.”
You can’t market “let Claude handle everything in the background” and then blame users who do exactly that. You’re holding it wrong?
3 Opaqueness forces rationing (the other poster comments re: rationing vs hoarding, I can’t reconcile it being hoarding since its use it or lose it.)
There’s still no real-time meter inside Claude/CC, only a vague icon that turns red near 50%. Power users end up rationing queries because hitting the weekly wall means a seven day timeout. Thats a dark dark pattern if I’ve seen one, id think not appropriate for developer tooling. (CCusage is a helpful tool that shouldn’t be needed!)
The, you’re holding it wrong, seems so bizarre to me meanwhile all of the other signaling is about more usage, more use cases, more dependency.
Yeah, the new sub-agents feature (which is great) is effectively unusable with the current rate limits.
Internet, text messages, etc are roughly that: the direct costs are so cheap.
That’s not the case with LLM’s at this moment. There are significant direct costs to each long-running agent.
But the cost to Bell and British Telecom was not £2 per minute, or £1 per minute, or even 1p per minute, it was nothing at all. Their costs were not for the call, but for the infrastructure over which the call was delivered, a transatlantic cable. If there was one call for ten minutes, once a week essentially at random, that cable must still exist, but if there are 10 thousand call minutes per week, a thousand times more, it's the same cable.
So the big telcos all just picked a number and understood it as basically free income. If everybody agrees this call costs £2 then it costs £2 right, and those 10 thousand call minutes generate a Million pound annual income.
It's maybe easier for Americans to understand if you tell them that outside the US the local telephone calls cost money back then. Why were your calls free? Because why not, the decision to charge for the calls is arbitrary, the calls don't actually cost anything, but you will need to charge somehow to recoup the maintenance costs. In the US the long distance calls were more expensive to make up for this for a time, today it's all absorbed in a monthly access fee on most plans.
In the US, ATT was just barely deregulated by then so the prices were not just 'out of thin air'.
Its successor TAT-8 carried ten times as many calls a few years later, industry professionals opined that there was likely no demand for so many transatlantic calls and so it would never be full. Less than two years later TAT-8 capacity maxed out and TAT-9 was already being planned.
Today lots of people have home Internet service significantly faster than all three of these transatlantic cables put together.
Prices will probably also drop if anyone ever works out how to feasibly compete with NVIDIA. Not an expert here, but I expect they're worried about competition regulators, who will be watching them very closely.
No, they won't. Because "AI assistants" are mostly wrapped around a very limited number of third-party providers.
And those providers are hemorrhaging money like crazy, and will raise the prices, limit available resources and cut off external access — all at the same time. Some of it is already happening.
It’s very expensive to create these models and serve them at scale.
Eventually the processing power required to create them will come down, but that’s going to be a while.
Even if there was a breakthrough GPU technology announced tomorrow, it would take several years before it could be put into production.
And pretty much only TSMC can produce cutting edge chips at scale and they have their hands full.
Between Anthropic, xAI and OpenAI, these companies have raised about $84 billion dollars in venture capital… VCs are going to want a return on their investment.
So it’s going to be a while…
How much has any if these decreased over the last 5 decades? The problem is that as of right now, LLM cost is linearly (if not exponentially) related to the output. It's basically "transferring energy" converted into bytes. So unless we see some breakthrough in energy generation, or better use it, it will be difficult to scale.
This makes me wonder, would it be possible to pre-compue some kind of "rainbow tables" equivalent for LLMs? Either stored in the client or in the server; so as to reduce the computing needed for inference.
If you think about it, LLMs are used mostly when people are awake, at least right now. And when is the sun shining? Right. So, build a data-center somewhere where land is cheap and lots of solar panels can be build right next to it. Sure, some other energy source will be used for stability etc., but it won't be as expensive as the energy price for your home.
> This makes me wonder, would it be possible to pre-compue some kind of "rainbow tables" equivalent for LLMs?
Already happening. Read up on how those companies do caching prompt-prefixes etc.
I'd be curious to know how many tokens the average $200/mo user uses and what the cost on their end for it is.
That's why using the API directly and paying for tokens anything past that basic usage feels a bit nicer, since it's my wallet that becomes the limitation then, not some arbitrary limits dreamed up by others. Plus with something like OpenRouter, you can also avoid subscription tier related limits like https://docs.anthropic.com/en/api/rate-limits#rate-limits
Though for now Gemini 2.5 Pro seems to work a bit better than Claude for my code writing/refactoring/explanation/exploration needs. Curious what other cost competitive options are out there.
Except for one catastrophic binge where I accidentally left Opus on for a whole binge (KILL ME!!!), I use around $150/month. I like having the spigot off when I am not working.
Would the $100/month plan plus API for overflow come out ahead? Certainly on some months. Over the year, I don't know. I'll let you know.
I run Gemini Pro from within CC but I only use it for analysis and planning for which it is better than Claude (Opus).
I guess if your target language is Python or JS/TS etc., your milage may be considerably better.
For Rust it's simply not true.
Note: all of them sometimes screw up applying diffs, but in general are good enough.
So the team at least seems to be aware of its shortcomings in that area and working to improve it with some success which I appreciate.
But you are correct that Gemini CLI still lags behind for whatever reason. It gets stuck in endless thought loops way too often for me, like maybe 1/15 tasks hits a thought loop burning API credits or it just never exits from the “Completing task, Verifying completion, Reviewing completion, Assessing completion status…” phase (watching the comical number of ways it rephrases it is pretty funny though).
Meanwhile I’ve only had maybe one loop over a period of a couple months using Gemini 2.5 Pro heavily in Roo Code with the most recent version so it seems like an issue with the CLI specifically.
Mask off completely and just make it completely usage based for everyone. You could do something for trial users like first 20 (pick your number here) requests are free if you really need to in order to get people on board. Or you could do tiered pricing like first 20 free, next 200 for X rate, next 200 for X*1.25 rate, and then for really high usage users charge the full cost to make up for their extreme patterns. With this they can still subsidize for the people who stay lower on usage rates for market share. Of course you can replace 200 requests with just token usage if that makes sense but I'm sure they can do the math to make it work with request limits if they work hard enough.
Offer better than open-router pricing and that keeps people in your system instead of reaching for 3rd party tools.
If your tool is that good, even with usage based it will get users. The issue is all the providers are both subsidizing users to get market share, but also trying to prohibit bad actors and the most egregious usage patterns. The only way this 100% becomes a non-issue is usage based for everything with no entry fee.
But this also hurts some who pay a subscription but DONT use enough to account for the usage based fees. So some sales people probably don't like that option either. It also makes it easier for people to shop around instead of feeling stuck for a month or two since most people don't want multiple subs at once.
If I'm paying a flat rate, the only economic cost I am worrying about is "will this be faster than me doing it myself if it fails once or twice?"
If I am paying per token, and it goes off for 20 minutes without solving the problem, I've just spent $$ for no result. Why would I even bother using it?
For something like Claude Code, that's an even more concerning issue - how many background tasks have to fail before I reach my monthly spending limit? How do I get granular control to say "only spend 7 dollars on this task - stop if you cannot succeed." - and I have to write my own accounting system for whether it succeeds or fails.
I think that you should just subscribe to a preset allotment of tokens at a certain price, or a base tier with incremental usage costs for models that aren’t tiny (like paid per minute “long distance calling”).
I use an LLM tool that shows the cost associated with each message/request and most are pennies each. There’s a point where the friction of paying is a disincentive to using it. Imagine you had to pay $0.01 every time you Google searched something? Most people would never use the product because trying to pay $0.30/mo for one day a month of usage is annoying. And no one would want to prepay and fund an account if you weren’t familiar with the product. No consumer likes micro transactions
No one wants to hear this, but the answer is advertising and it will change the game of LLMs. Once you can subsidize the lowest end usage, the incentive for businesses to offer these $20 subscriptions will change, and they’d charge per-usage rates for commercial users.
The problem is that there's no way to gauge or control token usage.
I have no idea why Claude Code wrote that it consumed X tokens now, and Y tokens later, and what to do about it
I'm a fan of having both a subscription and a usage based plan available. The subscription is effectively a built in spending limit. If I regularly hit it and need more value, I can switch to an API key for unlimited usage.
The downside is you are potentially paying for something you don't use, but that is the same for all subscription services.
But I have slow months and think that might not actually be the winner. Basically I'm going to wait and see before I sign up for auto-pay.
Maybe that reflects higher underlying costs. Maybe their API prices are just inflated.
you can already pay per token by giving Claude Code an API key, if you want.
thus, the subtext of every complaint on this thread is that people want "unlimited" and they want their particular use to be under whatever the cap is, and they want it to be cheap.
No wonder that access to an expensive API which is an LLM is also rate-limited.
What does surprise me is that you can't buy an extra serving by paying more (twice the limit for 3x the cost, for instance). Either subscriptions don't make enough money, or their limits are at their datacenters and they have no spare capacity for premium plans.
LLMs will become more efficient, GPUs, memory and storage will continue to become cheaper and more commonplace. We’re just in the awkward early days where things are still being figured out.
My biggest issue is local models I can run on my m1/m4 mbp are not smart enough to use tools consistently, and the context windows are too small for iterative uses.
The last year has seen a lot of improvement in small models though (gemma 3n is fantastic), so hopefully it’s only a matter of time.
I'm assuming it'll get updated to include these windows as well. Pass in "blocks --live" to get a live dashboard!
ETA: You don’t need to authenticate or share your login with this utility, basically zero setup.
See that screenshot. It certainly shows you when your 5 hour session is set to refresh, in my understanding it also attempts to show you how you're doing with other limits via projection.
For example Stack Overflow used to handle all their traffic from 9 on-prem servers (not sure if this is still the case). Millions of daily users. Power consumption and hardware cost is completely insignificant in this case.
LLM inference pricing is mostly driven by power consumption and hardware cost (which also takes a lot of power/heat to manufacture).
They just finished their migration to the cloud, unracked their servers a few weeks ago https://stackoverflow.blog/2025/07/16/the-great-unracking-sa...
The infrastructure and hardware costs are seriously more costly than typical internet apps and storage.
Well, it is a limited resource, I'm glad they're making that clear.
Lots of things still have usage-based pricing (last I checked no gas stations are offering "all you can fill up" specials), and those things work out fine.
Unless/until I start having problems with limits, I'm willing to reserve judgment. On a max plan, I expect to be able to use it throughout my workday without hitting limits. Occasionally, I run a couple instances because I'm multitasking and those were the only times I would hit limits on the 5x plan. I can live with that. I don't hit limits on the 20x plan.
The stuff that we do now, my 13 year old self in 1994 would never dream of! When I dialed my 33.6kbps modem and left it going the whole night, to download an mp3.
It's exciting that nowadays we complain about Intelligent Agents bandwidth plans!! Can you imagine! I cannot imagine the stuff that will be built when this tech has the same availability as The Internet, or POTS!
Opus at 24-40 looks pretty good too. A little hard to believe they aren't losing a bunch of money still if you're using those limits tbh.
You can make your own comparison to however many hours you usually spend working in a week and how many sessions you have active on average during that time.
I don't really know how it's sustainable for something like SOTA LLMs.
Once enough developers are addicted to AI assisted coding the VCs will inevitably pull the rug.
I wonder if Alibaba will put out a 100B A10B coder model which could probably run for $0.5/M while giving decent output. That would be easily affordable for most developers/companies.
If anything pops this bubble, it won’t be ethics panels or model tweaks but subscription prices finally reflecting those electricity bills.
At that point, companies might rediscover the ROI of good old meat based AI.
That’s like saying when the price of gasoline gets too high, people will stop driving.
Once a lifestyle is based on driving (like commuting from the suburbs to a job in the city), it’s quite difficult and in some cases, impossible without disrupting everything else.
A gallon of gas is about 892% higher in 2025 than it was in 1970 (not adjusted for inflation) and yet most people in the US still drive.
The benefits of LLMs are too numerous to put that genie back in the bottle.
We’re at the original Mac (128K of RAM, 9-inch B&W screen, no hard drive) stage of LLMs as a mainstream product.
People get electric cars or public transport....
> Adjusting for long-term ridership trends on each system, seasonal effects, and inertia (the tendency for ridership totals to persist from one month to the next), CBO estimates that the same increase of 20 per- cent in gasoline prices that affects freeway traffic volume is associated with an increase of 1.9 percent in average system ridership. That result is moderately statistically significant: It can be asserted with 95 percent confidence that higher gasoline prices are associated with increased ridership.
https://www.cbo.gov/sites/default/files/110th-congress-2007-...
Though some of us might fall into the NS category instead.
It suggests:
Transparent queueing - Instead of blocking, queue requests with clear wait time estimates. Users can choose to wait or reschedule.
Usage smoothing - Soft caps with gradually increasing response times (e.g., 2s → 5s → 10s) rather than hard cutoffs.
Declared priority queues - Let users specify request urgency. Background tasks get lower priority but aren't blocked.
Time-based scheduling - Allow users to schedule non-urgent work during off-peak hours at standard rates.
Burst credits - Banking system where users accumulate credits during low usage periods for occasional heavy use.
I know nobody else really cares.. In some ways I wish I didn't think like this.. But its at this point not even an ethical thing, its just a weird fixation. Like I can't help but feel we are all using ovens when we would be fine with a toasters.
> "Most Max 20x users can expect 240-480 hours of Sonnet 4 and 24-40 hours of Opus 4 within their weekly rate limits."
In this post it says:
> "Most Max 5x users can expect 140-280 hours of Sonnet 4 and 15-35 hours of Opus 4 within their weekly rate limits."
How is the "Max 20x" only an additional 5-9 hours of Opus 4, and not 4x that of "Max 5x"? At least I'd expect a doubling, since I'm paying twice as much.
Transformer self-attention costs scale roughly quadratically with context window size. Servicing prompts in a 32k-token window uses much more compute per request than in an 8k-token window.
A Max 5× user on an 8k-token window might exhaust their cap in around 30 hours, while a Max 20× user on a 32k-token window will exhaust theirs in about 35 to 39 hours instead of four times as long.
If you compact often, keep context windows small etc, I'd wager that your Opus 4 consumption would approach the expected 4× multiplier... In reality, I assume the majority of users aren't clearing their context windows and just letting the auto-compact do it's thing.
Visualization: https://codepen.io/Sunsvea/pen/vENyeZe
That is true both on a relative scale ("20x") compared to my previous use of the $20 plan, but - more dishonestly, in my opinion - absolutely false when comparing my (minimal, single-session, tiny codebase) usage to the approximate usage numbers quoted in the marketing materials. The actual usage provided has regularly been 10% of the quoted allowance before caps are hit.
I have had responses from their CS team, having pointed this out, in the hope they would _at least_ flag to users times that usage limits are dramatically lower so that I can plan my working day a little better. I haven't received any sort of acknowledgement of the mismatch between marketing copy and delivered product, beyond promised "future fixes". I have, of course, pointed out that promised and hypothetical future fixes do not have any bearing on a period of paid usage that exists in the past. No dice!
I'm, unfortunately, a UK customer, and from my research any sort of recourse is pretty limited. But it has - without question - been one of the least pleasant customer experiences I've had with a company in some time, even allowing for Anthropic experiencing extremely high-growth.
Claude Code Router has been a Godsend for my usage level. I'm not sure I can justify the time and effort to care and pursue Anthropic's functional bait-and-switch offering more than I already have, because being annoyed about things doesn't make me happy.
But I completely second this: it's not acceptable to sell customers a certain amount of a thing - then and deliver another - and I hope US customers (who I believe should have more recourse) take action. There are few other industries where "it's a language and compute black box!" would be a reasonable defence, and I think it sets a really bad precedent going forward for LLM providers.
One might imagine that Anthropic's recent ~$200m US gov contract (iirc) might allow for a bit of spare cash to, for example, provide customers with the product they paid for (let alone refund them, where necessary) but that does not seem to be the case.
It makes me sad to see a great product undermined like this, which is, I think, a feeling lots of people share. If anyone is actually working towards wider recourse, and would find my (UK) usage data useful, they're very welcome to get in touch.
https://www.anthropic.com/pricing
That first bullet pretty clearly implies 4x the usage and the last one implies that Max gets priority over Pro, not that 20x gets priority over 5x.> Choose 5x or 20x more usage per session than Pro*
If a recruiter tells you you'll be getting "20x more money per hour" at this new startup, and you go there and you get only 6x, you're going to have a very different tone than "you sort of implied 20x".
Max plan
5x more usage than Pro $100.00/month + tax
Save 50% 20x more usage than Pro $200.00/month + tax
Especially with the "save 50%", if they're not actually offering 4x that of 5x, that's easily illegal false advertising in half the territories Anthropic's customers are located in.
I think the disconnect here is that the 5x or 20x is true within a single session (and you'll see their website seems to always say this, clearly their legal team went over it with a fine tooth comb). The above about weekly quotas etc., isn't within a single session so the 5 or 20x no longer applies.
Gross.
NOTHING breaks flow better than "Woops! Times up!"; it's worse than credit quotas -- at least then I can make a conscious decision to spend more money or not towards the project.
This whole 'twiddle your thumbs for 5 hours while the gpus cool off' concept isn't productive for me.
'35 hours' is absolutely nothing when you spawn lots of agents, and the damn thing is built to support that behavior.
I wouldn't call "spawning a lot of agents" to be a typical use case of the personal plan.
That was always in the domain of switching to a pay as you go API. It's nice that they allowed it on the fixed rate plans, but those plans were always advertised as higher limits, not unlimited.
Slowly bringing up prices as people get dependent sounds like a pretty decent strategy if they have the money to burn
It’s more likely that this sum is higher than they want. So really it’s not about predictability.
- a user subsidizing other users
- a user subsidized by other users
I don't know what OP prefers, but given that people are saying "woof, API pricing too expensive", it sounds like the latter.
The problem, of course, is the provider has to find a market where the one sustains the other. Are there enough users who would pay > $200/mo without getting their money's worth in order to subsidize users paying the same rate, but using more than the average? I think the non-existence of a higher-tier plan says there probably isn't, but I don't want to give too much credence to markets, economics, etc.
https://docs.anthropic.com/en/api/rate-limits#requirements-t...
No comments yet
Using the $20 Pro sub and for anything above Hello World project size, it's easy to hit the 5 hour window limit in just 2 hours. Most of the tokens are spent on Claude Code own stupidity and its mistakes quickly snowballing.
1. set up your dozens of /\.?claude.*\.(json|md)/i dotfiles? 2. give insanely detailed prompts that took longer to write than the code itself? 3. Turn on auto-accept so that you can only review code in one giant chunk in diff, therefore disallowing you to halt any bad design/errors during the first shot?"
> ...easy to hit the 5 hour window limit in just 2 hours
I've had this experience. Sucks especially when you're working in a monorepo because you have client/server that both need to stay in context.
One user consumed tens of thousands in model usage on a $200 plan. Though we're developing solutions for these advanced use cases, our new rate limits will ensure a more equitable experience for all users while also preventing policy violations like account sharing and reselling access.
This is why we can’t have nice things.
It's amazing how fast you go from thinking nobody could ever use that much of your service to discovering how many of your users are creatively abusing the service.
Accounts will start using your service 24/7 with their request rating coming at 95% of your rate limiter setting. They're accessing it from a diverse set of IPs. Depending on the type of service and privacy guarantees you might not be able to see exactly what they're doing, but it's clearly not the human usage pattern you intended.
At first you think you can absorb the outliers. Then they start multiplying. You suspect batches of accounts are actually other companies load-splitting their workload across several accounts to stay under your rate limits.
Then someone shows a chart of average profit or loss per user, and there's a giant island of these users deep into the loss end of the spectrum consuming dollar amounts approaching the theoretical maximum. So the policy changes. You lose those 'customers' while 90+% of your normal users are unaffected. The rest of the people might experience better performance, lower latencies, or other benefits because the service isn't being bombarded by requests all day long.
Basically every startup with high usage limits goes through this.
Essentially people had all their security cameras and PVR units uploading endlessly to the cloud and Microsoft was footing the bill.
Then the 1TB limit came in to stop that.
It's nice to have an unlimited tier where there's no limit but you get your hand slapped when you go beyond reasonable. But people abuse shit like this and now lawyers have to get involved and we can't have the nice thing anymore.
Worked great for years, decades even, until crypto miners caught on - and maxed out the usage. Ruined it for the other 99.99% of renters.
clearly that's abusive and should be targeted. but in general idk how else any inference provider can handle this situation.
cursor is fucked because they are a whole layer of premium above the at-cost of anthropic / openai etc. so everyone leaves goes to cc. now anthropic is in the same position but they can't cut any premium off.
you can't practically put a dollar cap on monthly plans because they are self exposing. if you say 20/mo caps at 500/mo usage then that's the same as 480/500 (95%) discount against raw API call. that's obviously not sustainable.
there's a real entitled chanting going on too. i get that it sucks to get used to something and have it taken away but does anyone understand that just the cap/opex alone is unsustainable let alone the RD to make the models and tools.
I’m not really sure what can be done besides a constant churn of "fuck [whoever had to implement sustainable pricing], i'm going to [next co who wants to subsidize temporarily in exchange for growth]".
i think it's shitty the way it's playing out though. these cos should list these as trial periods and be up front about subsidizing. people can still use and enjoy the model(s) during the trial, and some / most will leave at the end, but at least you don't get the uproar.
maybe it would go a long way to be fully transparent about the cap/op/rdex. nobody is expecting a charity, we understand you need a profit margin. but it turns it from the entitled "they're just being greedy" chanting to "ok that makes sense why i need to pay X to have 1+ tireless senior engineers on tap".
You can't abuse a company by buying their services and using them to their own terms and conditions. The T&C is already stacked against you, you're in a position of no leverage.
The correct solution is what Anthropic is doing here - change the T&C so you can make money. If you offer unlimited stuff, people will use it... unlimitedly. So, don't let them call your bluff.
Because of that, IMO end-users can't abuse the contract, no matter how hard they try. It's not on them to do that, because they have zero control over the contract. It's a have-your-cake-and-eat-it-too problem.
Anthropic simultaneously retains complete control of the contract, but they want to "outsource" responsibility for how it's used to their end-users. No... it's either one or the other. Either you're in complete control and therefore hold complete accountability, or you share accountability.
end users did have power. the power to use the service legitimately, even as a power user. two choices were possible, with the users given the power to decide:
1. use it for an entire 8 hour workday with 1-2 agents at most - limited by a what a human could possibly entertain in terms of review and guidance.
2. use for 24 hours a day, 7 days a week with recursive agents on full opus blast. no human review could even be possible with this much production. its the functional equivalent of one person managing a team of 10-20 cracked engineers on adderall that pump out code 24 hours a day.
the former was the extreme of a power user with a practical deliverable. the latter is a circus whose sole purpose is to push the bounds and tweet about it.
now the lawyers get some fresh work to do and everyone gets throttled. oh and that 2nd group? they'll be, and are, the loudest about how they've been "rug pulled just like cursor".
"you're not wrong, you're just an asshole" - the dude to walter.
(no particular offense directed, the you here is of course the "royal you").
Look, in my view, Anthropic made a mistake. And that's okay, we all do.
But I'm not going to let a multi-billion dollar company off the hook because some nobodies called them out on their bluff. No, Anthropic made the mistake, and now they're fixing it.
Ultimately, this came out of greed - but not the greed of the little people. Anthropic chose aggressive pricing because, like all somewhat large corporations, they usually opt for cheating instead of winning by value. What I mean is, Anthropic didn't strive for The Best product, they instead used their capital as collateral to sell a service at loss to squeeze competitors, particularly small, non-incumbent ones.
And, that's fine, it's a legitimate business strategy. Walmart does it, Amazon does it, whatever. But if that backfires, I don't care and I won't extend sympathy. Such a strategy is inherently risky. They gambled, people called their bluff, and now they're folding.
I’m not suggesting you be sympathetic to anthropic. I’m suggesting sympathy for people who were using it legitimately, such as myself and others in areas where $200/mo is an extraordinary commitment, and we're not blind but appreciative to their subsidizing the cost.
the core of my position is, was it necessary for people to use it wastefully because they could? what was gained from that activity? sticking it to that greedy corporation? did it outweigh what was lost to the other 95%+ of users?
i don't think we're debating from compatible viewpoints. i maintain it's not wrong, just abusive. you maintain it's not wrong, it is [was] allowed. so be it.
the party's over anyways. the result is an acceleration on the path of normalizing the true cost of usage and it's clear that will unfortunately, or maybe justifiably in your eyes, exclude a lot of people who can't afford it. cheers man.
No comments yet
Do you have a link?
I'm always curious to see these users after working at a startup that was the target of some creative use from some outlier customers.
not the tweet but here's a leaderboard of claude clowns bragging about their spend. maybe you can find their handles and ask them what MRR they hit spending $500k (not a typo) in credits.
https://www.viberank.app/
who knows, just something i came across when trying to find the twitter thread.
pointing to the most extreme example as if you can't stop it in it's tracks is a bad argument. its like saying we will now restrict sending of emails for everyone because this one spammer was sending 1000x the amount of an avg or even power user when you should just be solving the actual problem (identifying and stopping those that disrupt).
> This is why we can’t have nice things.
We're living in the worst world that Stallman could have predicted. One in which even HN agrees that people shouldn't be allowed to share or resell what they pay for.
All AI companies are hitting the same thing and dealing with the same play - they don't want users to think about cost when they're prompting, so they offer high cost flat fee plans.
The reality is though there will always be a cohort of absolute power users who will push the limits of those flat fee plans to the logical extremes. Startups like Terragon are specifically engineered to help you optimize your plan usage. This causes a cat and mouse game where they have to keep lowering limits as people work around them, which often results in people thinking about price more, not less.
Cursor has adjusted their limits several times, now Anthropic is, others will soon follow as they decide to stop subsidizing the 10% of extreme power users.
Just offer metered plans that let me use the web interface.
It lasted less than a week -unlimited- been a shit show cutting down since then
I'm reminded of online storage plans with various levels of "unlimited" messaging around them that can't even hold a single medium to large hard drive of data. Very few users hit that, most of whom don't even have a hard drive they regularly use, but it means they shouldn't be going anywhere near the word "unlimited".
will they refund me my sub?
when I subbed It was unlimited, they've rugged the terms twice already since then in less than a month
Read the announcement. You are getting a full month's notice. If you don't like the limits, don't renew your subscription. Of course that doesn't help if your primary goal is to be an online outrage culture warrior.
No comments yet
Where did you see unlimited usage? The Max plan was always advertised as higher limits, not unlimited usage.
yes it was unlimited. so is the public water fountain. but if you show up and hold the button down to run nonstop while chanting "it says unlimited free water doesn't it??" you must expect that it will no longer be unlimited.
we went from reasonably unlimited, which 95% of users enjoyed, respected and recognized was subsidized, to no unlimited anymore because 5% wanted to abuse it. now you can scream about being rugged, just like you did for cursor, and jump to the next subsidized provider that you can abuse until there's none left. you do realize that every time "unlimited" gets abused it raises the standard of limits and pricing across the board until it becomes normalized. this was going to happen anyways on a longer timeframe where providers could optimize inference and models over time so the change wasn't so shocking, but abuse accelerated it.
The problem is this would reveal how expensive it _actually_ is to service interference right now at the scale that people use it for productive things.
Last Friday I spent about $15 in 1 hour using claude code with API key, and the code doesn't really work, even though all the unit tests passed. I am not going to touch it for weeks, while the loss is fresh in my mind.
With a subscription though, you can keep on gambling, until you get a hit.
I have no idea if I’m in the top 5% of users. Top 1% seems sensible to rate limit, but top 5% at most SaaS businesses is the entire daily-active-users pool.
It’s an all you can eat buffet, you’re just not allowed takeout!
My issue is: a request made during peak usage is treated the same as a request made during low usage times even though I might not be able to get anything useful/helpful out of the LLM during those busy hours.
I've talked with coworkers and friends who say the same.
This isn't a problem with Claude specifically - seems to happen with all the coding assistants.
No comments yet
$100 doesn't even cover the electricity of running the servers every night, they were abusing a service and now everyone suffers because of them.
I don’t know what is there to be mad about, and using dramatic language like “everyone suffers because of them”
This is clearly what was happening with the most extreme Claude Code users, because it's not actually that smart yet and still requires a human to often be in the loop.
However, Anthropic can't really identify "wasted code".
The price simply did not reflect the cost, and that's a problem. It happens to a lot of business and sometimes consumer's call their bluff. Whoops.
You wanna cheat and undercut competitors by shooting yourself in the foot with costs that exceed price? Fine. It's a tale as old as time. Here, have your loss lead - xoxo, every consumer.
Just charge per unit.
The tragedy of the commons is the concept that, if many people enjoy unfettered access to a finite, valuable resource, such as a GPU farm, they will tend to overuse it and may end up destroying its value altogether.
That is exactly what happened here. The price was fine if everyone upheld their moral obligation not to abuse it.
There's only one person who made a mistake here - Anthropic. They purposefully make their terms and conditions bad, and then when people played by the contract they set forth, they lost money. It's calling a bluff.
Anthropic purposefully priced this far too aggressively to undercut their competitors. Companies with stupid amounts of investor capital do that all the time. They flew too close to the sun.
You can't create a contract, have all the power in the world to rig the contract in your favor, and then complain about said contract. Everyone was following the rules. The problem was the rules were stupid.
To be more specific - abuse requires an exercise of power. End-users have no power at all. They have literally zero leverage over the contract and they have no power to negotiate. They can't abuse anything, they're too weak.
Again, there is no moral obligation to ensure Anthropic's business goes well and conveniently.
if your actions are defined by legal ToS then no, they didn't do anything wrong. they paid, it's the company's fault for not expecting someone to use 50-100x a reasonable usage.
if your actions are defined by ethical use then you understood that 50-100x the use would inevitably lead to ruining the party for everyone.
it's like a buffet. everyone pays a flat price and can enjoy a big meal for themselves. maybe sometimes having a few extra pieces of cake here and there. until someone shows up and starts stacking plates well beyond any reasonable expectation (not rule based) of a buffet customer. what's the result? stringent rules that are used for legal rather than rational enforcement.
it's obvious that even "reasonable use" is being subsidized, and the company was okay with doing so. until we have people running 10 opus instances in a glutinous orchestra of agents just because they can. now the party is over. and i'd love to know what these claude agencies were even producing running 24/7 on opus. i can't imagine what human being even has the context to process what 24/7 teams of opus can put out. much like i can't imagine the buffet abuser actually enjoying the distending feast. but here we are.
Why are you assuming everyone will suffer?
They backtested the new limits on usage data and found it will begin to impact less than 5% of users.
But a compute-focused datacenter is probably not paying more than 10 cents per kWh, so $100 would pay for more than a 24/7 kilowatt of GPU plus cooling plus other overhead.
I’m just curious how this decision came about. In most cases, I’ve seen either daily or monthly limits, so the weekly model stood out.
Option 1: You start out bursting requests, and then slow them down gradually, and after a "cool-down period" they can burst again. This way users can still be productive for a short time without churning your servers, then take a break and come back.
Option 2: "Data cap": like mobile providers, a certain number of high requests, and after that you're capped to a very slow rate, unless you pay for more. (this one makes you more money)
Option 3: Infrastructure and network level adaptive limits. You can throttle process priority to de-prioritize certain non-GPU tasks (though I imagine the bulk of your processing is GPU?), and you can apply adaptive QoS rules to throttle network requests for certain streams. Another one might be different pools of servers (assuming you're using k8s or similar), and based on incoming request criteria, schedule the high-usage jobs to slower servers and prioritize faster shorter jobs to the faster servers.
And aside from limits, it's worth spending a day tracing the most taxing requests to find whatever the least efficient code paths are and see if you can squash them with a small code or infra change. It's not unusual for there to be inefficient code that gives you tons of extra headroom once patched.
Probably phrased to sound like little but as someone used to seeing like 99% (or, conversely, 1% down) as a bad uptime affecting lots and lots of users, this feels massive. If you have half a million users (I have no idea, just a ballpark guess), then you're saying this will affect just shy of the 25 thousand people that use your product the most. Oof!
(Congrats on 777 karma btw :). No matter the absolute number on sites like these, I always still love hitting palindromes or round numbers or such myself)
Seems like some people are account-sharing or scripting/repackaging to such an extent that they were able to "max out" the rate limit windows.
Ultimately - this all gets priced in over time; whether that's in a subscription change or overall rate limit change, etc.
So if you want to simply use it as intended, over time stopping this kind of pattern is better for us?
Some stuff I’ve used it for in the last day: figuring out what a family member needs for FAFSA as a nontraditional student, help identify and authenticate some rare first editions and incunabula for a museum collection I volunteer at, find a list of social events in my area (based on my preferences) that are coming up in the next week (Chatgpt Agent works surprisingly well for this too), adapting Directus and Medusa to my project’s existing schema and writing up everything I need to migrate, and so on.
Deep research really hits the Claude limits hard and that’s the best way to avoid hallucinations when asking an important question or making it write complex code. I just switch from Claude to ChatGPT/Gemini until the limits reset but Claude’s deep research seems to handily beat Gemini (and OpenAI isnt even in the running). DR queries take much longer (5-10 min in average) but have much more in depth and accurate answers.
I can see how work involving larger contexts and deeper consideration would lead to exhausting limits a lot faster though, even if you aren't using it like a slot machine.
Isn't this something you can do with a simple Google search? Or Perplexity?
No need to shove by far the most expensive LLM (Claude Opus 4) at it.
Collate all the LA Metro area events from different sources and whip up an app or web site where people can filter them and subscribe to the events in Google Calendar or in .ical format.
You can even have Claude vibe code it for you :)
Then not using the canonball is just a waste of time, which is a heck of a lot more valuable than some purist aversion to using LLMs to save time and effort.
I know LLMs aren't as much of an environmental scourge as people sometimes make them out to be, but if they're used eagerly and aggressively, their impacts certainly have a capability of scaling in concerning ways.
I assume that the people hitting limits are just letting it cycle, but doesn't that just create garbage if you don't keep it on a tight leash? It's very eager but not always intelligent.
The issue could be, in part, that a lot of users don't care to be efficient with token usage and maintaining condensed, efficient, focused contexts to work with.
I have two problems with that. Firstly, I want my code to be written a particular way, so if it's doing something out of left field then I have to reject it on stylistic grounds. Secondly, if its solution is too far from my expectation, I have to put more work into review to check that its solution is actually correct.
So I give it a "where, what, how" prompt. For example, "In file X add feature Y by writing a function with signature f(x: t), and changing Z to do W..."
It's very good at following directions, if you give it the how hints to narrow the solution space.
I haven't yet seen anyone doing anything remarkable with their extensive use of Claude. Without frequent human intervention, all of it looks like rapid regression to the mean, or worse.
No comments yet
I see so many folks claiming crazy hardware rigs and performance numbers so no idea where to begin. Any good starting points on this?
(Ok budget is TBD - but seeing a you get X for $Y would atleast help make an informed decision).
- 2x 4070 Ti (32 GB total VRAM) - $2200
- 64 GB RAM - $200-250
- Core i9/Ryzen 9 CPU - $450
- 2 TB SSD - $150
- Motherboard, cooler, case, PSU - $500-600
Total - ~$3500-3700, say $4000 with extras.
I'm curious how much lower quality we're talking about here. Most of the work I ever get an LLM to do is glue-code, or trivial features. I'd expect some fine-tuned Codestral type model with well focused tasks could achieve good performance locally. I don't really need worlds-leading-expert quality models to code up a hamburger menu in a React app & set the background-color to #A1D1C1.
My other worry about the mac is how unupgradable it is. Again not sure how fruitful it is - in my (probably fantasy land) view if I can setup a rig and then keep updating components as needed - it might last me a good 5 years say for 20k over that period? Or is that too hopeful?
So for 20K over 5 years or 4k per year - it comes to about 400 a month (ish). The equivalent of 2 MAX pro subscriptions. Let us be honest - right now with these limits running more than 1 in parallel is going to be forbidden.
if I can run 2 claude level models (assuming the DS and Qwens are there) then I am already breaking even but without having to participating in training with all my codebases (and I assume I can actually unlock something new in the process of being free).
https://www.reddit.com/r/LocalLLaMA/comments/1iqpzpk/8x_rtx_...
You can look for more rig examples on that subreddit.
I imagine theres also going to be some problems hooking something like that up to a normal wall socket in North America? (I like the reddit poster am in Europe so on 220v)
I use 208V power but 120V can indeed be a challenge. The USA has split phase wiring; every house has 220-240V if they need it. Bit of a misunderstanding of how our power works - we have 220-240V on tap, but typical outlets are 110-120V.
1. switch models using /model 2. message 3. switch back to opus using /model
Help me help you (manage usage) by allowing me to submit something like "let's commit and push our changes to github #sonnet". Tasks like these rarely need opus-level intelligence and it comes up all the time.
https://docs.anthropic.com/en/docs/claude-code/sub-agents
We don't know what the limits are, what conditions change the limits dynamically, and we cannot monitor our usage towards the limits.
1. 5 hour limit
2. Overall weekly limit
3. Opus weekly limit
4. Monthly limit on number of 5 hour sessions
I also have to wonder how much Sub Agents and MCP are adding to the use, sub agents are brand new and won’t even be in that 95% statistic.
At the end of this email there a lot of unknowns for me (am I in the 5%, will I get cut off, am I about to see my usage increase now that I added a few sub agents?). That’s not a good place to be as a customer.
[1]: https://epoch.ai/data-insights/llm-inference-price-trends
It has become a kind of goal to hit it twice a day. It means I've had a productive day and can go on and eat food, touch grass, troll HN, read books.
I'm on Claude Code after hitting Cursor Pro for the month. It makes more sense to subscribe to a bunch of different tools at $20/month than $100/month on one tool that throws overloaded errors. We'll probably get more uptime with the weekly restriction.
I'll keep openAI and they dont even let me use CLI's with it, but they're at least Honest about their offerings.
Also their app doesnt tell you to go fuck off ever, if you're Pro
I'd be pretty surprised if I were to get rate limited, but I do use it a fair amount and really have no feel for where I stand relative to other users. Am I in the top 5%? How should I know?
https://openrouter.ai/z-ai/glm-4.5
It's even possible to point Claude Code CLI to it
This one thing that bugs me is the visibility of how far through your usage you are. Being told when you're close to the end means I cannot plan. I'm not expecting an exact %, but a few notices at intervals (eg: halfway through) would help a lot. Not providing this kinda makes me worry they don't want us to measure. (I don't want to closely measure, but I do want to have a sense of where I am at)
Who does that benefit? Does number of accounts beat revenue in their investor reports?
Waiting for higher valuations till someone pulls the trigger for acquisition.
IPOs I don't see to be successful because not everyone gets a conman like Elon as their frontman that can consistently inflate the balloon with unrealistic claims for years.
Is this way too complicated? It feels complicated to me and I worked on it, so I presume it is?
I don't want to end up in some "you can work for X number of hours" situation that seems... not useful to engineers?
How do real world devs wanna consume this stuff and pay for it so there is some predictability and it's useful still?
Thank you. :)
Anyway, I've been resigned to this for a while now (see https://x.com/doodlestein/status/1949519979629469930 ) and ready to pay more to support my usage. It was really nice while it lasted. Hopefully, it's not 5x or 10x more.
Hopefully they sort it out and increase limits soon. Claude Code has been a game-changer for me and has quickly become a staple of my daily workflows.
This is also exactly why I feel this industry is sitting atop a massive bubble.
you...already were? it already had a variety of limits, they've just added one new one (total weekly use to discourage highly efficient 24/7 use of their discounted subscriptions).
I can understand setting limits, and I'd like to be aware of them as I'm using the service rather than get hit with a week long rate limit / lockout.
Upshot - I will probably go back to api billing and cancel. For my use cases (once or twice a week coding binges) it’s probably cheaper and definitely less frustrating.
Or is that a silly idea, because distillation is unlikely to be stopped by rate limits (i.e., if distillation is a worthwhile tactic, companies that want to distill from Anthropic models will gladly spend a lot more money to do it, use many, many accounts to generate syntheitc data, etc.)?
If I’m on annual Pro, does it mean these won’t apply to me till my annual plan renews which is several months away.
What are the reasonable local alternatives? 128 GB of ram, reasonably-newish-proc, 12 GB of vram? I'm okay waitign for my machine to burn away on LLM experiments I'm running, but I don't want to simply stop my work and wake up at 3 AM to start working again..
I think you're just confused about what the Pro plan was, it never included being used for 168 hours/week, and was extremely clear that it was limited.
> What are the reasonable local alternatives? 128 GB of ram, reasonably-newish-proc, 12 GB of vram? I'm okay waitign for my machine to burn away on LLM experiments I'm running, but I don't want to simply stop my work and wake up at 3 AM to start working again..
a $10k mac mini with 192GB of vram with any model you can download still isn't close to Claude Sonnet.
The daily limits are probably there to fix the account sharing issue. For example I wanted to ask a friend who uses the most expensive subscription for work, if I could borrow the account at night and on weekends. I guess that's the kid of pattern they want to stop.
Somehow you're "not allowed" to run your account 24/7. Why the hell not? Well because then they're losing money. So it's "against their ToS". Wtf? Basically this whole Claude Code "plan" nonsense is Anthropic lighting VC on fire to aggressively capture developer market share, but enough "power users" (and don't buy the bullshit that it's "less than 5%") are inverting that cost:revenue equation enough to make even the highly capitalized Anthropic take pause.
They could have just emailed the 4.8% of users doing the dirty, saying "hey, bad news". But instead EVERYONE gets an email saying "your access to Claude Code's heavily subsidized 'plans' has been nerfed".
It's the bait and switch that just sucks the most here, even if it was obviously and clearly coming a mile away. This won't be the last cost/fee balancing that happens. This game has only gotten started. 24/7 agents are coming.
Frustrated users, who are probably using the tools the most will try other code generation tools.
That said, there's no fucking way I am getting what they claim w/Opus in hours. I may get two to three queries answered w/Opus before it switches to Sonnet in CC.
Notice they didn't say 5% of Max users. Or 5% of paid users. To take it to the extreme - if the free:paid:max ratio were 400:20:1 then 5% of users would mean 100% of a tier. I can't tell what they're saying.
There are people that will always try to steal, but there may also be those that just don't understand their pricing.
Also some people keep going forever in the same session, causing it to max out - since the whole history is sent in every request. Some prompting about things like that (your thread has gotten long..) would probably save quite a bit of usage and prevent innocent users from getting locked out for a week.
Claude is vital to me and I want it to be a sustainable business. I won't hit these limits myself, and I'm saving many times what I would have spent in API costs - easily among the best money I've ever spent.
I'm middle aged, spending significant time on a hobby project which may or may not have commercial goals (undecided). It required long hours even with AI, but with Claude Code I am spending more time with family and in sports. If anyone from Anthropic is reading this, I wanted to say thanks.
If the new limits are anything less than 24 * 7 / 5 times the previous limits then power users are getting shafted (which understandably is the point of this).
What's worse with this model is that a runaway process could chew through your weekly API allotment on a wild goose chase. Whereas before the 5-hour quantization was both a limiter and guard rails.
Then again, to scale is human
This is the most exciting business fight of our time and I’m chomping popcorn with glee.
I think Anthropic is grossly overestimating the addressable market of a CLI tool, while also falsely believing they have a durable lead right now in their model, which I’m not so sure of. Also their treatment of their partners has been…shall we say…questionable. These are huge missteps at a time they should be instead hitting the gas harder imo.
They’re getting cocky. Would love to see a competitor to swoop in and eat their lunch.
It's easy to forget the product Anthropic are selling here, and throttling, is based on data they mostly pay little or no content fee for
How about adding ToS clause to prevent abuse? wouldn't that be better than having a statement with negative effect on the rest of 95%?
Anthropic seems like they need to boost up their infra as well (glad they called this out), but the insane over-use can only be hurting this.
I just can't cosign on the waves of hate that all hinges on them adding additional limits to stop people from doing things like running up $1k bills on a $100 plan or similar. Can we not agree that that's abuse? If we're harping on the term "unlimited", I get the sentiment, but it's absolutely abuse and getting to the point where you're part of the 5% likely indicates that your usage is abusive. I'm sure some innocent usage will be caught in this, but it's nonsense to get mad at a business for not taking the bath on the chunk of users that are annihilating the service.
I switched to Claude Code because of Cursor’s monthly limits.
If I run out of my ability to use Claude Code, I’m going to just switch back to Cursor and stay there. I’m sick of these games.
If you think it’s ok, then make Anthropic dog food it by putting every employee in the pro plan and continue to tell them they must use it for their work but they can’t upgrade and see how they like it.
We know inference is very expensive so it's not reasonable to expect unlimited usage in general...
> sounds like it affects pretty much everyone who got some value out of the tool
Feels that way.
But compared to paying the so-called API pricing (hello ccusage) Claude Code Max is still a steal. I'm expecting to have to run two CC Max plans from August onwards.
$400/mo here we come. To the moon yo.
...are you allowed to do that? I guess if they don't stop you, you can do whatever you want, but I'd be nervous about an account ban.
Say an 8xB200 server costs $500,000, with 3 years depreciation, so $166k/year costs for a server. Say 10 people share that server full time per year, so that's going to need $16k/year/person to break even, so ~$1,388/month subscription to break even at 10x users per server.
If they get it down to 100 users per server (doubt it), then they can break even at $138/month.
And all of this is just server costs...
Seems AI coding agents should be a lot more expensive going forward. I'm personally using 3-4 agents in parallel as well..
Still, it's a great problem for Anthropic to have. "Stop using our products so much or we'll raise prices!"
A realistic business plan would be to burn cash for many years (potentially more than a decade), and bank on being able to decrease costs and increase revenue over that time. Investors will be funding that journey.
So it is way too early to tell whether the business plan is unsustainable. For sure the unit economics are going to be different in 5 and 10 years.
Right now is very tough though- since it is basically all early adopter power user types, which spend a lot of compute. Later one probably can expect more casuals, maybe even a significant amount of "gym users" that pay but basically never uses the service. Though OpenAI is currently stronger for casuals, I suspect.
Over the next decade, hardware costs will go down a lot. But they have go find a way to stay alive (and competitive) until then.
PS. Ah! Of course. Agents ...
The problem is, we have no visibility into how much we’ve actually used or how much quota we have left. So we’ll just get throttled without warning—regularly. And not because we’re truly heavy users, but because that’s the easiest lever to pull.
And I suspect many of us paying $200 a month will be left wondering, “Did I use it too much? Is this my fault?” when in reality, it never was.
that's exactly what they've done? they've even put rough numbers in the email indicating what they consider to be "abusive"?
I use CC 1-3h a day. Three days a week. Am I a heavy user now? Will I be in the 5% group? If I am, who will I argue with? Anthropic says in its mail that I can cancel my subscription.
Charging a low flat fee per use and still warning when certain limits hit is possible. But it's market segmentation not to do it. Just charge a flat fee, then lop off the high-end, and you maximize profit.
The two models are not just the best models for coding at this point (in areas like UX/UI and following instructions they are unmatched); they come package with possibly the best command line tool today.
The invite developers to use them a lot. Yet for the first time ever, I can feel how I cannot 100% fully rely on the tool and feel a lot of pressure, when using it. Not because I don't want to pay, but because the options are either:
> A) Pay $200 and be constantly warned by the system that you are close to hitting your quota (very bad UX) > B) Pay $$$??? via the API and see how your bill grows to +$2k per month (this is me this month via Cursor)
I guess Anthropic has the great dilemma now: should they make the models more efficient to use and lower the prices to increase limits and boost usage OR should they cash in their cash cows while they can?
I am pretty sure no other models comes even close in terms of developer-hours at this point. Gemini would be my 2nd best guess, but Gemini is still lagging behind Claude, and not that good at agentic workloads.
Why not use the user's timezone?
We're going to punish the 5% that are using our service too much.
If you work on some overengineered codebase, it will produce overengineered code; this requires more tokens.
When you have your functional spec, and your tech spec, ask it to implement it. Additionally add some basic rules, say stuff like "don't add any fallback mechanisms or placeholders unlessed asked. Keep a todo of where you're at, ask any questions if unsure.
The key is to communicate well, ALWAYS READ what you input. Reivew, and provide feedback. Also i'd reccomend doing smaller chunks at a time once things get more complicated.
- It would be nice to know if there was a way to know or infer percentage wise the amount of capacity a user is currently using (rate of usage) and has left, compared to available capacity. Being scared to use something is different than mindful.
- Since usage it can feel a little subjective/relative (simple things might use more tokens, or less, etc) to things beyond a user's usage alone, it would be nice to know how much capacity is left both on the current model and in 1 month now to learn.
- If there is lower "capacity" usage rates available at night vs the day, or just slower times, it might be worth knowing. It would help users who would like to, plan around it, compared to people who might be just making the most of it.
You having the same issue kills the point of using you.
It makes no sense to me that you would tell customers “no”. Make it easy for them to give you more money.
this entire thread is people whinging about the "you get some reasonable use for a flat fee" product having the "reasonable use" level capped a bit at the extreme edge.
leveling the playing field i see lol
Economists are having a field day.
i just found ccusage, which is very helpful. i wish i could get it straight from the source, i dont know if i can trust it... according to this ive spent more my 200$ monthly subscription basically daily in token value.. 30x supposed cost
ive been trying to learn how to make ccode use opus for planning and sonnet for execution automatically, if anyone has a good example of this please share
Use /resume
Part of the reason there is so much usage is because using claude code is like slot machine, where SOMETIMES it's right, most times it needs to rework what it did, which is convenient for them. Plus their pricing is anything but transparent as for how much usage you actually get.
I'll just go back to ChatGPT. This is not worth the headache.
I assume this is the end of the viability of the fixed price options.
> it affects pretty much everyone who got some value out of the tool AND is not a casual coder.
I didn’t mean casual in the negative sense, so there is no “better”, there is only casual and not casual.
My theory is that 5% sounds like a small number until you realize that many people just tried it and and didn’t like it, forgot to cancel, have it paid by their employer wishing for 100x improvements, or most people have found AI useful only in certain scenarios that they face every once in a while etc.
We do know that PR teams enjoy framing things in the most favorable light possible.
I am not saying this is what must happen here, but I see no effort to substantiate why it could not.
can someone please find a conservative, sustainable business model and stick with it for a few months please instead of this mvp moving target bs
Seems pretty standard to me.
The Buffet-style pricing gets you more bang for the buck. How much more? That bit is uncertain. Adjust your expectations accordingly.