Claude Code weekly rate limits

572 thebestmoshe 651 7/28/2025, 6:27:51 PM

Hi there,

Next month, we're introducing new weekly rate limits for Claude subscribers, affecting less than 5% of users based on current usage patterns.

Claude Code, especially as part of our subscription bundle, has seen unprecedented growth. At the same time, we’ve identified policy violations like account sharing and reselling access—and advanced usage patterns like running Claude 24/7 in the background—that are impacting system capacity for all. Our new rate limits address these issues and provide a more equitable experience for all users.

What’s changing: Starting August 28, we're introducing weekly usage limits alongside our existing 5-hour limits: Current: Usage limit that resets every 5 hours (no change) New: Overall weekly limit that resets every 7 days New: Claude Opus 4 weekly limit that resets every 7 days As we learn more about how developers use Claude Code, we may adjust usage limits to better serve our community. What this means for you: Most users won't notice any difference. The weekly limits are designed to support typical daily use across your projects. Most Max 5x users can expect 140-280 hours of Sonnet 4 and 15-35 hours of Opus 4 within their weekly rate limits. Heavy Opus users with large codebases or those running multiple Claude Code instances in parallel will hit their limits sooner. You can manage or cancel your subscription anytime in Settings. We take these decisions seriously. We're committed to supporting long-running use cases through other options in the future, but until then, weekly limits will help us maintain reliable service for everyone.

We also recognize that during this same period, users have encountered several reliability and performance issues. We've been working to fix these as quickly as possible, and will continue addressing any remaining issues over the coming days and weeks.

–The Anthropic Team

Comments (651)

Wowfunhappy · 18h ago

I'm probably not going to hit the weekly limit, but it makes me nervous that the limit is weekly as opposed to every 36 hours or something. If I do hit the limit, that's it for the entire week—a long time to be without a tool I've grown accustomed to!

I feel like someone is going to reply that I'm too reliant on Claude or something. Maybe that's true, but I'd feel the same about the prospect of loosing ripgrep for a week, or whatever. Loosing it for a couple of days is more palatable.

Also, I find it notable they said this will affect "less than 5% of users". I'm used to these types of announcements claiming they'll affect less than 1%. Anthropic is saying that one out of every 20 users will hit the new limit.

el_benhameen · 18h ago

This is how I feel about the 100 msg/wk limit on o3 for the ChatGPT plus plan. There’s no way to see how much I’ve used, and it’s an important enough resource that my lizard brain wants to hoard it. The result is that I way underutilize my plan and go for one of the o4-mini models instead. I would much prefer a lower daily limit, but maybe the underutilization is the point of the weekly limit.

*edited to change “pro” to “plus”

landl0rd · 16h ago

You can tell how it’s intentional with both OpenAI and Anthropic by how they’re intentionally made opaque. I cant see a nice little bar with how much I’ve used versus have left on the given rate limits so it’s pressuring users to hoard. Because it prevents them from budgeting it out and saying “okay I’ve used 1/3 of my quota and it’s Wednesday, I can use more faster.”

xpe · 16h ago

> pressures users to hoard

As a pedantic note, I would say 'ration'. Things you hoard don't magically go away after some period of time.

zamadatix · 11h ago

FWIW neither hoard nor ration imply anything about permanence of the thing to me. Whether you were rationed bread or you hoarded bread, the bread isn't going to be usable forever. At the same time whether you were rationed sugar or hoarded sugar, the sugar isn't going to expire (with good storage).

Rationed/hoarded do imply, to me, something different about how the quantity came to be though. Rationed being given or setting aside a fixed amount, hoarded being that you stockpiled/amassed it. Saying "you hoarded your rations" (whether they will expire) does feel more on the money than "you ration your rations" from that perspective.

I hope this doesn't come off too "well aktually", I've just been thinking about how I still realize different meanings/origins of common words later in life and the odd things that trigger me to think about it differently for the first time. A recent one for me was that "whoever" has the (fairly obvious) etymology of who+ever https://www.etymonline.com/word/whoever vs something like balloon, which has a comparatively more complex history https://www.etymonline.com/word/balloon

kanak8278 · 20m ago

It can only happens in HackerNews that people talking about Claude Code limit, can start discussing what is a better a word for explaining it. :-)

I just love this community for these silly things.

mattkrause · 11h ago

For me, the difference between ration and hoard is the uhh…rationality of the plan.

Rationing suggests a deliberate, calculated plan: we’ll eat this much at these particular times so our food lasts that long. Hoard seems more ad hoc and fear-driven: better keep yet another beat-up VGA cable, just in case.

jjani · 10h ago

> Hoard seems more ad hoc and fear-driven: better keep yet another beat-up VGA cable, just in case.

Counterexample: animals hoarding food for winter time, etc.

nothrabannosir · 6h ago

Rather a corroborating example than a counter, if you believe how many nuts squirrels lose sight of after burying them.

14123newsletter · 8h ago

Isn't hoarding means you can get more bread ? While rationing means: "here is 1kg, use it however you want but you can't get more".

zamadatix · 1h ago

Hoarding doesn't really imply how you got it, just that you stockpile once you do. I think you're bang on rationing - it's about assigning the fixed amount. The LLM provider does the rationing, the LLM user hoards their rations.

One could theoretically ration their rations out further... but that would require knowing the usage to the point to set the remaining fixed amounts - which is precisely whT's missing in the interface.

nine_k · 7h ago

Rationing implies an ability to measure: this amount per day. But measuring the remaining amount is exactly what Claude Code API does not provide.

So, back to hoarding.

sothatsit · 12h ago

Anthropic also does this because they will dynamically change the limits to manage load. Tools like ccusage show you how much you've used and I can tell sometimes that I get limited with significantly lower usage than I would usually get limited for.

TheOtherHobbes · 6h ago

Which is a huge problem, because you literally have no idea what you're paying for.

One day a few of hours of prompting is fine, another you'll hit your weekly limit and you're out for seven days.

While still paying your subscription.

I can't think of any other product or service which operates on this basis - where you're charged a set fee, but the access you get varies from hour to hour entirely at the provider's whim. And if you hit a limit which is a moving target you can't even check you're locked out of the service.

It's ridiculous. Begging for a law suit, tbh.

lukaslalinsky · 4h ago

What happens when you have a gym membership, but you go there during their busy hours?

What they could do is pay as you go, with pricing increasing with the demand (Uber style), but I don't think people would like that much.

canada_dry · 11h ago

OpenAI's "PRO" subscription is really a waste of money IMHO for this and other reasons.

Decided to give PRO a try when I kept getting terrible results from the $20 option.

So far it's perhaps 20% improved in complex code generation.

It still has the extremely annoying ~350 line limit in its output.

It still IGNORES EXPLICIT CONTINUOUS INSTRUCTIONS eg: do not remove existing comments.

The opaque overriding rules that - despite it begging forgiveness when it ignores instructions - are extremely frustrating!!

JoshuaDavid · 7h ago

One thing that has worked for me when I have a long list of requirements / standards I want an LLM agent to stick to while executing a series of 5 instructions is to add extra steps at the end of the instructions like "6. check if any of the code standards are not met - if not, fix them and return to step 5" / "7. verify that no forbidden patterns from <list of things like no-op unit tests, n+1 query patterns, etc> exist in added code - if you find any, fix them and return to step 5" etc.

Often they're better at recognizing failures to stick to the rules and fixing the problems than they are at consistently following the rules in a single shot.

This does mean that often having an LLM agents so a thing works but is slower than just doing it myself. Still, I can sometimes kick off a workflow before joining a meeting, so maybe the hours I've spent playing with these tools will eventually pay for themselves in improved future productivity.

jmaker · 8h ago

There are things it’s great at and things it deceives you with. In many things I needed it to check something for me I knew was a problem, o3 kept insisting it were possible due to reasons a,b,c, and thankfully gave me links. I knew it used to be a problem so surprised I followed the links only to read black on white it still wasn’t. So I explained to o3 that it’s wrong. Two messages later we were back at square one. One week later it didn’t update its knowledge. Months later it’s still the same.

But at things I have no idea about like medicine it feels very convincing. Am I in hazard?

People don’t understand Dunning-Kruger. People are prone to biases and fallacies. Likely all LLMs are inept at objectivity.

My instructions to LLMs are always strictness, no false claims, Bayesian likelihoods on every claim. Some modes ignore the instructions voluntarily, while others stick strictly to them. In the end it doesn’t matter when they insist on 99% confidence on refuted fantasies.

namibj · 7h ago

The problem is that all current mainstream LLMs are autoregressive decoder-only, mostly but not exclusively transformers. Their math can't apply modifiers like "this example/attempt there is wrong due to X,Y,Z" to anything that came before the modifier clause in the prompt. Despite how enticing these models are to train, these limitations are inherent. (For this specific situation people recommend going back to just before the wrong output and editing the message to reflect this understanding, as the confidently wrong output with no advisory/correcting pre-clause will "pollute the context": the model will look at the context for some aspects coded into high(-er)-layer token embeddings, inherently can't include the correct/wrong aspect because we couldn't apply the "wrong"/correction to the confidently-wrong tokens, thus retrieves the confidently-wrong tokens, and subsequently spews even more BS. Similar to how telling a GPT2/GPT3 model it's an expert on $topic made it actually be better on said topic, this affirmation of that the model made an error will prime the model to behave in a way that it gets yelled at again... sadly.)

brookst · 12h ago

I think the simple product prioritization explanation makes way more sense than a a second-order conspiracy to trick people into hoarding.

Reality is probably that there’s a backlog item to implement a view, but it’s hard to prioritize over core features.

Zacharias030 · 6h ago

I hear OpenAI and Anthropic are making tools that are supposedly pretty good at helping with creating a view from a backlog.

Back to the conspiracy ^^

bluelightning2k · 5h ago

"We were going to implement a counter but hit out weekly Claude code limit before we could do it. Maybe next week? Anthropic."

parineum · 8h ago

> Reality is probably that there’s a backlog item to implement a view, but it’s hard to prioritize over core features.

It's even harder to prioritize when the feature you pay to develop probably costs you money.

hinkley · 10h ago

> it’s an important enough resource that my lizard brain wants to hoard it.

I have zero doubt that this is working exactly as intended. We will keep all our users at 80% of what we sold them by keeping them anxious about how close they are to the limit.

wiseowise · 2h ago

> There’s no way to see how much I’ve used

Hover on it on a desktop, it’ll show how many requests you have left.

sitkack · 18h ago

Working as Intended.

Wowfunhappy · 17h ago

Well, kind of. If you don't use it at all you're going to unsubscribe.

This isn't like a gym membership where people join aspirationally. No one's new year's resolution is "I'm going to use o3 more often."

mattigames · 15h ago

Yes it is, in the way of "I'm gonna work on X thing that is now much easier thanks to chatGPT" and then never work on it due lack of time or motivation or something else.

christina97 · 17h ago

What makes you think it’s any different?

gfiorav · 16h ago

I nervously hover over the VSCode Copilot icon, watching the premium requests slowly accumulate. It’s not an enjoyable experience (whether you know how much you've used or not :) )

benjiro · 8h ago

Noticed that my productive usage of CoPilot dropped like a brick, after they introduced those limits. You feel constantly on the clock, and being forced to constantly change models gets tiresome very fast.

Unless you use "free" GPT 4.1 like MS wants you (not the same as Claude, even with Beast Mode). And how long is that going to be free, because it feels like a design to simply push you to a MS product (MS>OpenAI) instead of third party.

So what happens a year from now? Paid GPT 5.1? With 4.1 being removed? If it was not for the insane prices of actual large mem GPUs and the slowness of large models, i will be using LLMs at home. Right now MS/Antropic/OpenAI are right in that zone where its not too expensive yet to go full local LLM.

oc1 · 15h ago

They know this psychology. This dark pattern is intentional so you will use their costly service less.

hn_throwaway_99 · 15h ago

I don't think this counts as a "dark pattern". The reality is that these services are resource constrained, so they are trying to build in resource limits that are as fair as possible and prevent people from gaming the system.

const_cast · 11h ago

The dark pattern isn't the payment pattern, that's fine. The dark pattern is hiding how much you're using, thereby tricking the human lizard brain into irrationally fearing they are running out.

The human brain is stupid and remarkably exploitable. Just a teensy little bit of information hiding can illicit strange and self-destructive behavior from people.

You aren't cut off until you're cut off, then it's over completely. That's scary, because there's no recourse. So people are going to try to avoid that as much as possible. Since they don't know how much they're using, they're naturally going to err on the side of caution - paying for more than they need.

gorbypark · 4h ago

I'm only on the $20 Pro plan, and I'm a big users of the /clear command. I don't really use Claude Code that much either, so the $20 plan is perfect for me. However, a few times I've gotten the "approaching context being full, auto compact coming soon" thing, so I manually do /compact and I run out of the 5hr usage window while compacting the context. It's extremely infuriating because if I could have a view into how close I was to being rate limited in the 5 hour window, I might make a different choice as to compact or finish the last little thing I was working on.

hshdhdhj4444 · 11h ago

Besides the click mazes to unsubscribe I’m struggling to think of a darker pattern than having usage limits but not showing usage.

The dark pattern isn’t the usage limit. It’s the lack of information about current and remaining usage.

aspenmayer · 13h ago

> prevent people from gaming the system

If I sit down for dinner at an all-you-can-eat buffet, I get to decide how much I’m having for dinner. I don’t mind if they don’t let me take leftovers, as it is already understood that they mean as much as I can eat in one sitting.

If they don’t want folks to take advantage of an advertised offer, then they should change their sales pitch. It’s explicitly not gaming any system to use what you’re paying for in full. That’s your right and privilege as that’s the bill of goods you bought and were sold.

Wowfunhappy · 10h ago

I feel like using Claude Code overnight while you sleep or sharing your account with someone else is equivalent to taking home leftovers from an all-you-can-eat buffet.

I also find it hard to believe 5% of customers are doing that, though.

kelnos · 2h ago

I think this is just a bad analogy. I've definitely set Claude Code on a task and then wandered off to do something else, and come back an hour or so later to see if it's done. If I'd chosen to take a nap, would you say I'm "gaming the system"? That's silly. I'm using an LLM agent to free up my own time; it's up to me to decide what I do with that time.

Wowfunhappy · 6m ago

No, this doesn't sound like gaming the system to me. However, if you were using a script to automatically queue up tasks so they can run as soon as your 5-hour-session expires to ensure you're using Claude 24/7, that's a different story. A project like this was posted to HN relatively recently.

As I said, I have trouble believing this constitutes 5% of users, but it constitutes something and yeah, I feel Anthropic is justified in putting a cap on that.

benterix · 4h ago

I use Claude Code overnight almost exclusively, it's simply not worth my time during the day. It's just easier to prepare precise instructions, let it run and check the results in the morning. If it goes awry (it usually does), I can modify the instructions and start from scratch, without getting too attached to it.

closewith · 5h ago

I use Claude Code with Opus four days a week for about 5 hours a day. I've only once hit the limit. Yet the tool others mentioned here (ccusage) indicates I used about $120 in API equivalents per day or about $1,800 to date this month on a $200 subscription. That has to be a loss leader for Anthropic that they now want to wind back.

I also wouldn't consider my usage extreme. I never use more than one instance, don't run overnight, etc.

aspenmayer · 10h ago

If that’s off-peak time, I’d argue the adjacent opposite point, that Anthropic et al could implement deferred and/or scheduled jobs natively so that folks can do what they’re going to do anyway in a way that comports with reasonable load management that all vendors must do.

For example, I don’t mind that Netflix pauses playback after playing continuously for a few episodes of a show, because the options they present me with acknowledge different use cases. The options are: stop playing, play now and ask me again later, and play now and don’t ask me again. These options are kind to the user because they don’t disable the power user option.

gorbypark · 4h ago

Is there really an off peak time, though? I think Anthropic is running on AWS with the big investment from Amazon, right? I'm sure there's some peaks and valleys but with the Americas, Europe and Asia being in different time zones I'd expect there'd be a somewhat "baseline" usage with peaks where the timezones overlap (European afternoons and American mornings, for example). I know in my case I get the most 503 overloaded errors in the European afternoon.

Timwi · 6h ago

The dark pattern is not telling users how much they've used so they can't plan or ration.

milankragujevic · 14h ago

Where did you find this info? I am unable to find in on OpenAI's website. https://help.openai.com/en/articles/6950777-what-is-chatgpt-...

I haven't yet run into this limit...

milankragujevic · 9h ago

Found it: https://help.openai.com/en/articles/9824962-openai-o3-and-o4...

_giorgio_ · 7h ago

Not sure. But o3 seems to be 200/10 days, not weekly anymore in my opinion.

littlestymaar · 18h ago

> This is how I feel about the 100 msg/wk limit on o3 for the ChatGPT

Do I read this correctly? Only 100 messages per week, on the pro plan worth a few hundred buck a month?!

CSMastermind · 17h ago

That's definitely not correct because I'm on the pro plan and make extensive use of o3-pro for coding. I've sent 100 messages in a single day with no limitation.

Per their website: https://help.openai.com/en/articles/9793128-what-is-chatgpt-...

There are no usage caps on pro users (subject to some common sense terms of use).

mhl47 · 18h ago

No it's 100 a week for plus users.

el_benhameen · 17h ago

Sorry, yes, plus, not pro.

doorhammer · 17h ago

I think it’s just a mistype

I have a pro plan and I hammer o3–I’d guess more than a hundred a day sometimes—and have never run into limits personally

Wouldn’t shock me if something like that happened but haven’t seen evidence of it yet

artursapek · 14h ago

Just curious, what do people use these expensive reasoning models for?

wahnfrieden · 10h ago

Code

jstummbillig · 18h ago

If it behaves anything like the GPT-4.5 Limit, it will let you know when you near the limit.

yencabulator · 15h ago

Claude daily limit sure didn't.

forty · 16h ago

It makes me sad that devs start relying on proprietary online services to be able to work. We have been lucky enough to have FOSS tools to do everything and not to have to rely on any specific service or company to work and some of us are deciding to become like Monsanto-addicted farmers who forgot how to do their jobs without something they have to pay every month.

pythonguython · 12h ago

Do you mind sharing what industry you’re in where you can fully rely on FOSS? In my industry we’re dependent on MATLAB, Xilinx tools, closed source embedded software and more. To name a few industries: game devs might be stuck with unity, finance quant devs might be stuck with Bloomberg terminals, iOS app devs are stuck with apple’s tooling etc… this isn’t just an LLM problem IMO.

ozgung · 4h ago

I think this is different. Yes, 3rd party tools and services are nothing new. Depending on 3rd party libraries is also a standard thing, although minimum dependency is generally considered a good practice. But all these services provide something you don't want to do yourself and are willing to pay. They all complement what you do, and don't replace your core competency.

Apple is your business partner, doing marketing and distribution for you, and shares its user base. Bloomberg terminals provide real time data and UI to non-technical finance people. Github provides you Git hosting service so you don't need to setup and maintain servers. MATLAB (although there are Octave, Python and open alternatives) sells numerical computation environment to non-CS engineers. Xilinx is sells its hardware and dev tools. Game devs use Unity because they want to focus on gameplay and not game engine development.

These are all the examples of Division of Labor. This time, however, you have to pay for your core competency, because you cannot compete with a good AI coder in the long run. The value you provide diminishes to almost nothing. Yes you can write prompts, but anyone, even a mediocre LLM can write prompts these days. If you need some software, you don't need to hire SW engineers anymore. A handful of vendors dominate the SW development market. Yes, you can switch. But only between the 3 or 4 tech giants. It's an Oligopoly.

If we have FOSS alternatives, at least we can build new services around them and can move on to this new era. We can adapt. Otherwise, we become a human frontend between the client and the AI giants.

forty · 7h ago

That's a good remark. As my sibling said, backend and web dev.

But indeed it always struck me that some developpers decided to become Apple developpers and sacrifice 30% of everything they ever produce to Apple.

I would argue that it might a bit different though, because when doing iOS development it's possible that you don't lose you core skill, which is building software, and that you can switch to another platform with relative ease. What I think might happen with LLM is that people will lose the core skill (maybe not for the generation who did do LLM-less development, but some devs might eventually not ever know other ways to work, and will become digital vassals of whatever service managed to kill all others)

unsupp0rted · 57m ago

> and sacrifice 30% of everything they ever produce to Apple

In exchange for 500% more paid users

aprdm · 10h ago

backend web services devops frontend javascript

just three possible examples

forty · 7h ago

Yes. Web is more and more debatable as as time goes though, there is more and more only one company providing the software to access the web :)

andxor · 5h ago

And it's open source.

oblio · 4h ago

It's a Cathedral, not a Bazaar, though.

brookst · 12h ago

I remember being beholden to commercial (cough, pirated, cough) compilers and assemblers back in the day. FOSS is awesome but often lags because capital sees a chance to make money and can move faster.

It will change. There will be FOSS models, once it no longer takes hundreds of millions of dollars to train them.

epolanski · 7h ago

It makes me sad that couriers start relying on third party wheeled machines like cars and motorbikes.

conradfr · 1h ago

Do their motorbikes stop working in the middle of the week if they make too many deliveries or ask them to pay multiple the base rate for the remaining time?

hvb2 · 7h ago

You ignore the fact that there are many suppliers there with plenty of competition.

The 'take me from A to N' is a pretty broad problem that can have many different solutions. Is that comparable?

We can all see this end up in a oligopoly, no?

stingraycharles · 9h ago

Devs have been doing this for years. If Github goes down, it has a huge impact on dev productivity.

forty · 8h ago

Has it? We don't use GitHub in my company, but our self hosted Gitlab occasionally goes down, and while it prevents us from merging (including doing code reviews) and deploying code, it does not prevent us from doing most of our work (ie designing and creating software). It merely delays the moment when things are shipped.

If you meant goes down for good, then I'm sure it would be annoying for a few weeks for the FOSS ecosystem, just the time to migrate elsewhere, but there is not much GitHub specific we would really miss.

arvinsim · 10h ago

I think it's less about reliance and more about competition.

Sure, devs can still work without AI.

But if the developer who uses AI has more output than the one that doesn't, it naturally incentives everyone to leverage AI more and more.

forty · 8h ago

That's a big if. People might feel faster because there is more "movement", but it's not clear if overall they are actually signicantly faster (though everyone would like us to believe so).

And note that I objected online services, local LLM don't have the same issues.

epolanski · 7h ago

People who use it extensively may know whether it makes them significantly faster or not.

forty · 7h ago

I assume they have an opinion on the topic, but it doesn't mean they are right (or wrong).

Think of driving a car. If the shortest path (in term to time of travel) is through traffic jam, and there is a longer path where you can drive must faster, it's very likely that most people will have the feeling to be more efficient with the longer path.

Also the slow down of using LLM might be more subtle and harder to measure. They might happen at code review time, handling more bugs and incident, harder maintainance, recovering your deleted DB ;)...

epolanski · 6h ago

Apologies, but from antirez[1] to many other brilliant 1000x developers advocate for LLMs speeding up the process.

I can see the impact on my own input both in quantity and quality (LLMs can come up with ideas I would not come up to, and are very useful for tinkering and quickly testing different solutions).

As any tool it is up to the user to make the best out of it and understand the limits.

At this point it is clear that naysayers:

1) either don't understand our job

2) or haven't given AI tools the proper stress testing in different conditions

3) or are luddites being defensive about the "old" world

[1] https://www.antirez.com/news/154

forty · 5h ago

From your source

```

The fundamental requirement for the LLM to be used is: don’t use agents or things like editor with integrated coding agents. You want to:

* Always show things to the most able model, the frontier LLM itself.

* Avoid any RAG that will show only part of the code / context to the LLM. This destroys LLMs performance. You must be in control of what the LLM can see when providing a reply.

* Always be part of the loop by moving code by hand from your terminal to the LLM web interface: this guarantees that you follow every process. You are still the coder, but augmented.

```

Not sure about you, but I think this process, which your source seems to present as a prerequisites to use LLM efficiently (and seems good advice to me too, and actually very similar of how I use LLM myself) must be followed by less than 1% of LLM users.

forty · 6h ago

I wish I had only antirezs working on my projects, and would for sure be much more confident that some significant time might be saved with llms if that was the case.

zelphirkalt · 2h ago

1000x developers, ahahaha! Come on now, this is too comical. Even 10x is extremely rare.

The deciding factor is not speed. It is knowledge. Will I be able to dish out a great compiler in a week? Probably not. But an especially knowledgeable compiler engineer might just do it, for a simple language. Situations like this are the only 10x we have in our profession, if we don't count completely incapable people. The use of AI doesn't make you 1000x. It might make you output an infinite factor of AI slop more, but then you are just pushing the maintenance burden to a later point in time. In total it might make your output completely useless in the long run, making you a 0x dev in the worst case.

eichin · 2h ago

We've known for decades that self-reported time perception in computer interactions is drastically off (Jef Raskin, The Humane Interface in particular) so unless they have some specifically designed external observations, they are more likely to be wrong. (There have been more recent studies - discussed here on HN - about perception wrt chat interfaces for code specifically - that confirm the effect on modern tools.)

hvb2 · 7h ago

There's 2 problems there 1. 'faster' is subjective since you cannot do the same task twice without the second time being biased by the learnings from the first pass 2. While speed might be one measure, I've rarely found speed to be the end goal. Unless you're writing something that's throw away, you'll be reading what was written many times over. Long term maintainability is at odds with speed, in most cases.

epolanski · 6h ago

You're implying that LLMs make maintainability worst when the opposite could happen if you know how to use the tools.

zelphirkalt · 1h ago

But the tools are trained on tons and tons of mediocre work and will have a strong tendency to output such. Please share your prompts aimed at preventing mediocre code entering the code bases you work on.

So far almost no code I got from LLMs was acceptable to stay as suggested. I found it useful in cases, when I myself didn't know what a typical (!) way is to do things with some framework, but even then often opted for another way, depending on my project's goals and design. Sometimes useful to get you unstuck, but oh boy I wouldn't let it code for me. Then I would have to review so much bad code, it would be very frustrating.

danielbln · 15h ago

The very recent agentic open weights models seem to be shaping up, so if all fails you can host one of these yourself (if you have the vram) or host it yourself somewhere.

sneak · 14h ago

We can and did work without it. It just makes us many times faster.

Nothing about using an LLM removes skills and abilities you already had before it.

3836293648 · 13h ago

For the industry as a whole it absolutely does. And for the individual it absolutely does kill your ability to do it unless you actually do practice.

And yes, the goal might be to only use it for boilerplate or first draft. But that's today, people are lazy, just wait for the you of tomorrow

epolanski · 7h ago

> And for the individual it absolutely does kill your ability to do it unless you actually do practice.

Just because you state it, it doesn't make it true. I could tell you that taking buses or robotaxis doesn't change a bit your ability to drive.

forty · 6h ago

Depends how often you do drive. I can guarantee you that not driving absolutely affected my ability to drive (I can still drive but certainly not nearly as well as if I drove daily)

14123newsletter · 8h ago

>Nothing about using an LLM removes skills and abilities you already had before it.

Funny story: The widespread of Knorr soup stock already made people unable to cook their own stock soup, or even worse, the skill to season their soup from just basic, fresh ingredients.

Source: my mom.

windexh8er · 7h ago

I'm always surprised when people buy vegetable stock. So many people I know cook "from scratch" with base ingredients like stock out of a box.

And just as with cooking: most people won't care - and the same goes with LLMs. It can be good enough... Less efficient? Meh - cloud. AI slop image? Meh - cheaper than paying an artist. LLMs to get kids through school? Meh - something something school-of-life.

I look around and see many poorly educated people leaning hard into LLMs. These people are confusing parroting their prompt output as knowledge, especially in the education realm. And while LLMs may not "remove skills and abilities you already had before it" - you damn sure will lose any edge you had over time. It's a slippery slope of trading a honed skill for convenience. And in some cases that may be a worthwhile trade. In others that is a disaster waiting to happen.

umbra07 · 13h ago

If you don't use a skill, it atrophies.

Now, maybe that is the future (no more/extremely little human-written code). Maybe that's a good thing in the same way that "x technological advancement means y skill is no longer necessary" - like how the advent of readily-accessible live maps means you don't need to memorize street intersections and directions or whatever. But it is true.

brookst · 12h ago

I am terrible at computing sine and cosine, for sure. It doesn’t bother me.

zelphirkalt · 43m ago

On the surface, this comparison might hold, but when you look at software development as a craft, and therefore containing aspects of creativity and art, the comparison no longer holds.

hoppp · 12h ago

My experience was that reviewing generated code can take longer than writing it from scratch.

There was research about vibe coding that had similar conclusion. Feels productive but can take longer to review.

the moment you generate code you don't instantly understand you are better off reading the docs and writing it yourself

blitzar · 17h ago

> Anthropic is saying that one out of every 20 users will hit the new limit.

I regularly hit the the Pro limits 3 times a day using sonnet. If I use claude code & claude its over in about 30 minutes. No multi 24/7 agent whatever, no multiple windows open (except using Claude to write a letter between claude code thoughts).

I highly doubt I am a top 5%er - but wont be shocked if my week ends on a wednessday. I was just starting to use Claude chat more as it is in my subscription but if I can not rely on it to be available for multiple days its functionally useless - I wont even bother.

Aurornis · 17h ago

> If I use claude code & claude its over in about 30 minutes.

Can you share what you're doing? I've been experimenting with Claude Code and I feel like I have to be doing a lot with it before I even start seeing the usage warning limits on the $20/month plan.

When I see people claiming they're getting rate limited after 30 minutes on the $100/month plan I have a hard time understanding what they're doing so different.

For what it's worth I don't use it every day, so maybe there's a separate rate that applies to heavy and frequent users?

gorbypark · 3h ago

As a $20 month user, I can tell you in my experience it's "refactoring" jobs that really smash through those tokens quickly. If you do a "write a component that does this" kinda thing, you can use the $20 plan almost an unlimited amount of time. If you are doing "find all instances of ComponentFoo, change to ComponentBar, refactor each screen for correct usage of ComponentBar" kinda things, it's going to grep through your code, find multiple files, read all of them into context and start making changes one by one and/or spin up a subagent to do it. You'll be rate limited pretty quick doing things that way.

flutas · 16h ago

$20/mo plan doesn't include opus (the larger model) like the $100+ plans do, it's likely they are hitting the opus limit which is fairly low.

flowerthoughts · 7h ago

I'm on the $20 and was hitting the limit quite often. It lasted 2.5 h out of 5 h if I went all in. So even before this questionable change, we were at 50% utilization at most.

And I guess it'll go downhill from here. Anthropic, I wish you the best. Claude is a great tool at good value. But if you keep changing the product after my purchase, that's bad value.

r053bud · 12h ago

The $20/month plan most definitely does include Opus. Just not a ton of usage allowed.

eagleinparadise · 11h ago

Only opus on web. Not Claude code

bogtog · 16h ago

> I highly doubt I am a top 5%er - but wont be shocked if my week ends on a wednessday. I was just starting to use Claude chat more as it is in my subscription but if I can not rely on it to be available for multiple days its functionally useless - I wont even bother.

You very well might be a top 5%er among people only on the Pro rather than Max plan

ketzo · 17h ago

What does your Claude code usage look like if you’re getting limited in 30 minutes without running multiple instances? Massive codebase or something?

blitzar · 17h ago

I set claude about writing docstrings on a handful of files - 4/5 files couple 100 lines each - couple of classes in each - it didnt need to scan codebase (much).

Low danger task so I let it do as it pleased - 30 minutes and was maxed out. Could probably have reduced context with a /clear after every file but then I would have to participate.

tlbsofware · 16h ago

You can tell it to review and edit each file within a Task/subagent and can even say to run them in parallel and it will use a separate context for each file without having to clear them manually

blitzar · 16h ago

Every day is a school day - I feel like this is a quicker way to burn usage but it does manage context nicely.

tlbsofware · 16h ago

I haven’t ran any experiments about token usage with tasks, but if you ran them all together without tasks, then each files full operation _should_ be contributing as cached tokens for each subsequent request. But if you use a task then only the summary returned from that task would contribute to the cached tokens. From my understanding it actually might save you usage rates (depending on what else it’s doing within the task itself).

I usually use Tasks for running tests, code generation, summarizing code flows, and performing web searches on docs and summarizing the necessary parts I need for later operations.

Running them in parallel is nice if you want to document code flows and have each task focus on a higher level grouping, that way each task is hyper focused on its own domain and they all run together so you don’t have to wait as long, for example:

- “Feature A’s configuration” - “Feature A’s access control” - “Feature A’s invoicing”

Kurtz79 · 8h ago

If I understand correctly, looking at API pricing for Sonnet, output tokens are 5 times more expensive than input tokens.

So, if rate limits are based on an overall token cost, it is likely that one will hit them first if CC reads a few files and writes a lot of text as output (comments/documentation) rather than if it analyzes a large codebase and then makes a few edits in code.

stuaxo · 16h ago

I hope you thoroughly go through these as a human, purely AI written stuff can be horrible to read.

blitzar · 16h ago

Docstring slop is better than code slop - anyway that is what git commits are for - and I have 4.5 hours to do that till next reset.

debo_ · 15h ago

Coding is turning into an MMO!

rapind · 13h ago

I think you'll want to specify your /model to not use opus. Strangely unintuitive, but I opted out of opus on the max plan myself and aren't really having any usage issues since.

blitzar · 2h ago

no opus in claude code on my el'cheapo plan ($20 - pro) - used it occasionally in claude desktop and its as expensive as they advertise.

_jab · 17h ago

> Anthropic is saying that one out of every 20 users will hit the new limit.

Very good point, I find it unlikely that 1/20 users is account sharing or running 24/7 agentic workflows.

Terretta · 17h ago

Moreover, if you run a SaaS, generally somewhere from 1 in 5 to 1 in 20 users are using you for real, while the others are mostly not using you.

The stat would be more interesting if instead of 1 in 20 users, they said x in y of users with at least one commit per business day, or with at least one coding question per day, or whatever.

I suspect this could be a significantly higher percentage of professional users they plan to throttle. Be careful of defining Pro like Apple does if you market to actual professionals who earn based on using your product. Your DAUs might be a different ratio than you expect.

comebhack · 3h ago

I would probably show up in their metrics as an active user and one of the 95% but I barely use the product. I have a Pro subscription which I use for personal projects but I do very little, maybe using it once a week for a short session. At work I use Cursor via a corporate account.

I imagine there are lots of people like me who have a subscription to be aware of the product and do some very light work, but the "real" users who rely on the tool might be badly affected by this.

0cf8612b2e1e · 16h ago

  … if you run a SaaS, generally somewhere from 1 in 5 to 1 in 20 users are using you for real, while the others are mostly not using you.

That is a hilarious and believable stat. Has anyone published such numbers or is it a dirty secret about how many corporate licenses are purchased and never used by the rank and file?

I can personally think of a few internally licensed products, announced with huge fan fare, which never get used beyond the demo to a VP.

xxpor · 13h ago

https://en.wikipedia.org/wiki/Pareto_distribution / https://en.wikipedia.org/wiki/Pareto_principle

data-ottawa · 10h ago

Anecdotally that’s my observation as well.

rapind · 12h ago

A decent chunk (more than 1/20) account shared netflix. Also there are probably some who are account sharing with more than one other person. I don't really doubt it.

furyofantares · 16h ago

> I'm probably not going to hit the weekly limit, but it makes me nervous that the limit is weekly as opposed to every 36 hours or something. If I do hit the limit, that's it for the entire week—a long time to be without a tool I've grown accustomed to!

Well, not the entire week, however much of it is left. You said you probably won't hit it -- if you do, it's very likely to be in the last 36 hours (20% of a week) then, right? And you can pay for API usage anyway if you want.

arghwhat · 17h ago

> but I'd feel the same about the prospect of loosing ripgrep for a week, or whatever. Loosing it for a couple of days is more palatable.

Just to nitpick: When the limit is a week, going over it does not mean losing access for a week, but for the remaining time which would assuming the limits aren't overly aggressive mean losing access for at most a couple of days (which you say is more palatable).

I wouldn't say you're too reliant, but it's still good to stay sharp by coding manually every once in a while.

thephyber · 7h ago

(1) I use Claude Code as a solo iOS developer and don’t hit the 5 hour limits much. I suspect the users hitting it more often are throwing way more tokens from much larger code repos and are probably asking for larger incremental changes.

(2) I interpret this change as targeting people who are abusing the single Pro account, but using it more like a multi-developer business would maximizing the number of tokens (multiple sessions running 24/7 always hitting the limits). Anthropic has a business interest in pushing those users to use the API (paying per token) or upgrade to the $200/mo subscription.

(3) While I fear they might regularly continue to push the top x% usage tier users into the higher subscription rate, I also realize this is the first adjustment for token rates of Claude Pro since Claude Code became available on that subscription.

(4) If you don’t want to wait for the next unthrottling, you can always switch to the API usage and pay per token until you are unblocked.

arach · 17h ago

if it affects only a minority of accounts, why not figure out how to special case them without affecting everyone else is the primary question I would ask myself if I worked on this

the principle: let's protect against outliers without rocking the behavior of the majority, not at this stage of PMF and market discovery

i'd also project out just how much the compute would cost for the outlier cohort - are we talking $5M, $100M, $1B per year? And then what behaviors will simply be missed by putting these caps in now - is it worth missing out on success stories coming from elite and creative users?

I'm sure this debate was held internally but still...

vineyardmike · 17h ago

Because the goal is to extract more money from the people who have significant usage. These users are the actual targets of the product. The idea that it’s a few bad actors is misdirection of blame to distract “power users”.

They undercharged for this product to collect usage data to build better coding agents in the future. It was a ploy for data.

Anecdotally, I use Claude Code with the $20/mo subscription. I just use it for personal projects, so I figured $20 was my limit on what I’d be willing to spend to play around with it. I historically hit my limits just a few times, after ~4hrs of usage (resets every 5hrs). They recently updated the system and I hit my limits consistently within an hour or two. I’m guessing this weekly limit will affect me.

I found a CLI tool (which I found in this thread today) that estimates I’m using ~$150/mo in usage if I paid through the API. Obviously this is very different from my payments. If this was a professional tool, maybe I’d pay, but not as a hobbyist.

benoittravers · 7h ago

What was the name of that CLI tool?

vineyardmike · 5h ago

It’s called “ccusage”. Search for other comments on this story for more details.

Uehreka · 17h ago

> why not figure out how to special case them without affecting everyone else

I’m guessing that they did, and that that’s what this policy is.

If you’re talking about detecting account sharing/reselling, I’m guessing they have some heuristics, but they really don’t want the bad press from falsely accusing people of that stuff.

arach · 16h ago

fair enough - DS probably ran through data and came up with 5% and some weekly cutoff as a good starting point until they have better measures in place

my point is that 5% still a large cohort and they happen to be your most excited/creative cohort. they might not all want to pay a surchage yet while everyone is discovering the use cases / patterns / etc

having said that, entirely possible burn rate math and urgency requires this approach

data-ottawa · 17h ago

They did have several outages last week, it would be good to find better plans for those huge users but I can also see them wanting to just stop the bleeding.

arach · 17h ago

I've noticed the frequent perf issues and I'm on the 20x plan myself - good point that you'd want to stop the bleeding or bad actors to make sure the majority have a better experience

Aurornis · 17h ago

> why not figure out how to special case them without affecting everyone else is the primary question I would ask myself if I worked on this

The announcement says that using historical data less than 5% of users would even be impacted.

That seems kind of clear: The majority of users will never notice.

arach · 16h ago

5% of a large number is a large number - this why it's both a significant problem for them and why I'm thinking out loud about the downsides of discouraging good actors who happen to be power users.

that 5% is probably the most creative and excited cohort. obviously it's critical to not make the experience terrible for the 95% core, but i'd hate to lose even a minority of the power users who want to build incredible things on the platform

having said that, the team is elite, sure they are thinking about all angles of this issue

0cf8612b2e1e · 16h ago

5% seems like a huge number of previously ecstatic customers who may suddenly be angry. Especially when it is trivial to identify the top 0.1% of users who are doing something insane.

nharada · 17h ago

What do you think they should have done instead?

actsasbuffoon · 16h ago

At a bare minimum there needs to be some way to understand how close you are to these limits. People shouldn’t be wondering if this is going to impact them or not.

arach · 17h ago

It’s tricky without seeing the actual data. 5% of a massive user base can still be a huge number so I get that it’s hard to be surgical.

But those power users are often your most creative, most productive, and most likely to generate standout use cases or case studies. Unless they’re outright abusing the system, I’d lean toward designing for them, not against them.

if the concern is genuine abuse, that feels like something you handle with escalation protocols: flag unusual usage, notify users, and apply adaptive caps if needed. Blanket restrictions risk penalizing your most valuable contributors before you’ve even discovered what they might build

smileysteve · 15h ago

5% of a massive user base could also be huge if 50% of users are on an enterprise plan and barely using it.

arach · 13h ago

in other words, these limits will help introduce Enterprise (premium) plans?

bananapub · 17h ago

> if it affects only a minority of accounts, why not figure out how to special case them without affecting everyone else

that's exactly what they have done - the minority of accounts that consume many standard deviations above the mean of resources will be limited, everyone else will be unaffected.

arach · 17h ago

"You're absolutely right!" i misread the announcement - thought everyone moved to primarily a weekly window but seems like 5hr window still in place and they're putting in place another granularity level control that DS teams will adjust to cutoff mostly bad actors.

correct me if I'm wrong, it's not like we have visibility into the token limit logic, even on the 5hr window?

jonas21 · 17h ago

You can use an API key to get more usage on a pay-as-you-go basis.

blitzar · 17h ago

You can set cash on fire if you want to.

thejazzman · 16h ago

i've gotten months of usage out of openai and claude where i seeded each with only $5

but if you use an agent and it tries to include a 500kb json file, yeah, you will light cash on fire

(this happened to me today but the rate limit bright it to my attention.)

blitzar · 15h ago

Fortunately Ai companies are not (currently) like AWS - you can only burn as much money as you have on account.

For brief, low context interactions, it is crazy how far your money goes.

fullstackwife · 17h ago

you spend 400$ per month on api usage, but your AI builds the next unicorn worth billions, where is the problem?

blitzar · 17h ago

For $2 you can have a lottery ticket that will win you a quater of a billion dollars.

edude03 · 17h ago

You might have better odds buying scratchers - assuming you even have $400/m to invest in an enterprise without cash flow

Wowfunhappy · 17h ago

Yes, but that's so expensive I will never do it!

tqwhite · 15h ago

With one months exception, I've never gotten past $150 with API. I plan to do the $100 plan and use the API for overflow. I think I will come out ahead.

Wowfunhappy · 15h ago

Well, lucky you! Before Claude Max was an option, I burned a lot of money using Claude Code, and that was while I was trying my best to use it as little as possible.

wahnfrieden · 10h ago

Just buy several Max tier subscriptions...

matt3210 · 8h ago

>If I do hit the limit, that's it for the entire week

Now your vibes can be at the beach.

Dilettante_ · 7h ago

"The 4-hour vibe week"

theshrike79 · 7h ago

From what I gathered from their post, to hit the weekly limit you have to hit the daily limit MULTIPLE times during the week.

I'm pretty sure they calibrated it so that only the people who max out every 5 hour window consistently get hit by the weekly quota.

swalsh · 18h ago

I used about $300 worth of credits based on ccusage ($20 pro plan). It's pretty easy to hit the limit once you get going.

blitzar · 17h ago

I use about $20 a day - 6 days a week, on the $20 plan.

stingraycharles · 9h ago

Yeah, if this happens, I’m probably going to do the inevitable: get two accounts. If I could pay $500 a month for Claude 100x I would probably do it at this point.

Given that I rarely hit the session limits I’m hopeful I won’t be affected, but the complete and utter lack of transparency is really frustrating.

Aeolun · 8h ago

Like, I understand why they do it, usage patterns aren’t the same all over the world, and they paid for the GPU’s anyway, so they want to utilize them. People in parts of the world that see less CC utilization get to do more with their CC plan than people in very busy areas of the world.

jcims · 16h ago

Cursor sent me a note yesterday that at my current usage rate I was going to exceed whatever cap was in place at some date in the future. I thought that was very helpful.

motbus3 · 6h ago

i believe this is intended to combat abuse. that said, i do not like the idea.

as many other services did, and even some tangible products are implementing, the introduced limit will later on be used to create more tiers and charge you more for the same without providing anything extra. #shrinkflation

oblio · 4h ago

Shrinkflation usually assumes that the product is profitable.

Do we even know the Anthropic financials? My guess is that they're probably losing money on all their tiers.

lerchmo · 12h ago

Time to start using Claude code proxy and other models. This black box rate limit is really lame.

tqwhite · 15h ago

Don't you also have an API subscription to provide overflow capacity?

mattlangston · 12h ago

I have both a Claude subscription plan and console credits as my backup, which I thought was a reasonable solution.

jonfw · 13h ago

GitHub copilot has a monthly rate limit for premium models- much worse! I ran into mine within hours of using Claude

prolly97 · 7h ago

> Loosing it for a couple of days is more palatable

Sorry, I'll just be "that guy" for a moment. Assuming that access is cut at a random time during the week, the average number of days without Claude would be 3.5. That's not reasonable as it's dependant on usage. So assume that you've always been just shy of hitting the limit, and you increase usage by 50%, then you'd hit the limit 4.67 days in. Just 2-3 hours shy of the weekend - a sort of reward for the week's increased effort.

Have a blessed Thuesday.

jey · 15h ago

Get two subscriptions if it's delivering that much value and you hit the limits?

tkiolp4 · 15h ago

Don’t worry, just give them more money.

mrits · 18h ago

I imagine they will add some features soon where you have more control. It could get complicated quickly. Before they put this in I think they should have at least given you an easy way to buy more credits at a hugely discounted rate.

I know entire offices in Bangladesh share some of these accounts, so I can see how it is a problem.

fuzzzerd · 17h ago

That is exactly the use case they're trying to stop. Sharing accounts.

mrits · 17h ago

Which is why I said I see how it could be a problem :)

wahnfrieden · 10h ago

It is becoming typical to counter this (and the multi-hour limits) simply by purchasing multiple Max tier subscriptions.

j45 · 14h ago

Getting in early on a plan may not have as much of an upside where the computational costs and guaranteed heavily can look different than other services.

If it's affecting 5% of users, it might be people who are really pushing it and might not know (hopefully they get a specialized notice that they may see usage differences).

ekianjo · 13h ago

the first rule of APIs is do not expect them to work 24/7. and you are never in control of any change that can occur. Thats why its really important to cultivate local LLMs.

douglaswlance · 17h ago

pay for another account to double your limit.

draxil · 17h ago

Of course ripgrep runs on your machine and you control it.

dewey · 17h ago

ripgrep is an example for "tool I've grown accustomed to", where it runs is irrelevant.

sva_ · 16h ago

[flagged]

Wowfunhappy · 14h ago

On a Mac, you can use option-shift-dash to insert an emdash, which is muscle memory for me.

If I had used an LLM, maybe I wouldn't have misspelled "losing" not once but twice and not noticed until after the edit window. <_<

tomhow · 8h ago

Please don't do this here. If a comment seems unfit for HN, please flag it and email us at hn@ycombinator.com so we can have a look.

adastra22 · 15h ago

Was it necessary to post this?

FYI many input methods (including my own) turn two hyphens into an em dash. The em-dash-means-LLM thing is bogus.

seb1204 · 15h ago

Ms word does this by default

fredoliveira · 14h ago

You can't possibly think that using an em dash is exclusive to AI-generated output.

social_quotient · 4h ago

I love Claude Code, but Anthropic’s recent messaging is all over the map.

1- “More throughput” on the API, but stealth caps in the UI - On Jun 19 Anthropic told devs the API now supports higher per-minute throughput and larger batch sizes, touting this as proof the underlying infra is scaling. Yay!?? - A week later they roll out weekly hard stops on the $100/$200 “Max” plans — affecting up to 5 % of all users by their own admission.

Those two signals don’t reconcile. If capacity really went up, why the new choke point? I keep getting this odd visceral reaction/anticipation that each time they announce something good, we are gonna get whacked on an existing use case.

2- Sub-agents encourage 24x7 workflows, then get punished… The Sub-agent feature docs literally showcase spawning parallel tasks that run unattended. Now the same behavior is cited as “advanced usage … impacting system capacity.”

You can’t market “let Claude handle everything in the background” and then blame users who do exactly that. You’re holding it wrong?

3 Opaqueness forces rationing (the other poster comments re: rationing vs hoarding, I can’t reconcile it being hoarding since its use it or lose it.)

There’s still no real-time meter inside Claude/CC, only a vague icon that turns red near 50%. Power users end up rationing queries because hitting the weekly wall means a seven day timeout. Thats a dark dark pattern if I’ve seen one, id think not appropriate for developer tooling. (CCusage is a helpful tool that shouldn’t be needed!)

The, you’re holding it wrong, seems so bizarre to me meanwhile all of the other signaling is about more usage, more use cases, more dependency.

nojs · 2h ago

> 2- Sub-agents encourage 24x7 workflows, then get punished… The Sub-agent feature docs literally showcase spawning parallel tasks that run unattended.

Yeah, the new sub-agents feature (which is great) is effectively unusable with the current rate limits.

jimbo808 · 19h ago

I'm not sure how this will play out long term, but I really am not a fan of having to feel like I'm using a limited resource whenever I use an LLM. People like unlimited plans, we are used to them for internet, text messaging, etc. The current pricing models just feel bad.

andruby · 19h ago

Unlimited works well for everything that is “too cheap to meter”.

Internet, text messages, etc are roughly that: the direct costs are so cheap.

That’s not the case with LLM’s at this moment. There are significant direct costs to each long-running agent.

rmujica · 18h ago

Internet and SMS used to be expensive and metered until they weren't thanks to technological advances and expanded use. I think LLMs will follow the same path, maybe on a shorter timespan.

cmsjustin · 18h ago

They were not expensive to operate, they were only expensive for consumers

tialaramex · 18h ago

Right, that's crucial to understand. In 1985 you could make a direct dial from England to the US but it was eye wateringly expensive. £2 per minute. An hour's call to your mum? That's over £100.

But the cost to Bell and British Telecom was not £2 per minute, or £1 per minute, or even 1p per minute, it was nothing at all. Their costs were not for the call, but for the infrastructure over which the call was delivered, a transatlantic cable. If there was one call for ten minutes, once a week essentially at random, that cable must still exist, but if there are 10 thousand call minutes per week, a thousand times more, it's the same cable.

So the big telcos all just picked a number and understood it as basically free income. If everybody agrees this call costs £2 then it costs £2 right, and those 10 thousand call minutes generate a Million pound annual income.

It's maybe easier for Americans to understand if you tell them that outside the US the local telephone calls cost money back then. Why were your calls free? Because why not, the decision to charge for the calls is arbitrary, the calls don't actually cost anything, but you will need to charge somehow to recoup the maintenance costs. In the US the long distance calls were more expensive to make up for this for a time, today it's all absorbed in a monthly access fee on most plans.

tqwhite · 15h ago

There was some capital expenditure that had to be paid for.

In the US, ATT was just barely deregulated by then so the prices were not just 'out of thin air'.

daveguy · 17h ago

This analysis doesn't concern the limited bandwidth available for call delivery on plain old telephone networks (POTS). They did squeeze extra money out of the system with their networks as a monopoly, but the cost was zero only if you don't consider the cost of operating and maintaining the network, or the opportunity cost of having much less bandwidth than currently available. For the former, they still had to fix problems. For the latter if they had made calls pennies everyone would have had "all circuits are busy" all the time. A single line wasn't capable of carrying 10,000 calls back then. Pricing to limit usage to available bandwidth was as important as recouping infrastructure costs and ongoing maintenance. There's also a lemonade stand pricing effect. If you charge too little you don't get enough to cover costs. But if you charge too much, not enough people will do business and you won't cover costs. Also, ma bell was broken up in 1982, but regional monopolies lasted a lot longer (telecommunications act of 1996).

tialaramex · 15h ago

TAT-7 which was in operation in 1985 when I cited the £2 per minute price carried 4000 simultaneous calls, ie up to £8000 per minute

Its successor TAT-8 carried ten times as many calls a few years later, industry professionals opined that there was likely no demand for so many transatlantic calls and so it would never be full. Less than two years later TAT-8 capacity maxed out and TAT-9 was already being planned.

Today lots of people have home Internet service significantly faster than all three of these transatlantic cables put together.

daveguy · 10h ago

Thank you for confirming my statements.

KaiserPro · 18h ago

To lay the cables required a huge amount of capital, to make that feasible its required financial engineering. That translates to high operating expenses.

AlotOfReading · 18h ago

SMS was originally piggybacking off unused bytes in packets already being sent to the tower, which was being paid for by existing phone bills. The only significant expenses involved transiting between networks. That was a separate surcharge in the early days.

ssk42 · 17h ago

Used to be? What changed?

daveguy · 17h ago

People started sending a lot more texts and making a lot fewer phone calls. And you can only piggyback so many text messages on the call packets.

hkt · 18h ago

Competition is the thing. Prices will drop as more AI code assistants get more adoption.

Prices will probably also drop if anyone ever works out how to feasibly compete with NVIDIA. Not an expert here, but I expect they're worried about competition regulators, who will be watching them very closely.

troupo · 16h ago

> Prices will drop as more AI code assistants get more adoption.

No, they won't. Because "AI assistants" are mostly wrapped around a very limited number of third-party providers.

And those providers are hemorrhaging money like crazy, and will raise the prices, limit available resources and cut off external access — all at the same time. Some of it is already happening.

alwillis · 18h ago

Yes and no.

It’s very expensive to create these models and serve them at scale.

Eventually the processing power required to create them will come down, but that’s going to be a while.

Even if there was a breakthrough GPU technology announced tomorrow, it would take several years before it could be put into production.

And pretty much only TSMC can produce cutting edge chips at scale and they have their hands full.

Between Anthropic, xAI and OpenAI, these companies have raised about $84 billion dollars in venture capital… VCs are going to want a return on their investment.

So it’s going to be a while…

margalabargala · 17h ago

SMS was designed from the start to fit in the handul of unused bytes in the tower handshake that was happening anyway, hence the 160 char limit. Its marginal cost has always been free on the supply side.

RF_Savage · 17h ago

SMS routing and billing systems did cost money. Especially billing, as the standards had nothing for it, so it was done by 3rd party software for a very long time.

margalabargala · 17h ago

Of course, how pleasingly circular. "It's so expensive because it costs so much to charge you for it".

baq · 16h ago

Exactly! AWS is so expensive because it can be so cheap. Billing was the true innovation.

xtracto · 15h ago

I think LLMs follow more of an Energy analogy: Gas or Electricity, or even water.

How much has any if these decreased over the last 5 decades? The problem is that as of right now, LLM cost is linearly (if not exponentially) related to the output. It's basically "transferring energy" converted into bytes. So unless we see some breakthrough in energy generation, or better use it, it will be difficult to scale.

This makes me wonder, would it be possible to pre-compue some kind of "rainbow tables" equivalent for LLMs? Either stored in the client or in the server; so as to reduce the computing needed for inference.

valenterry · 9h ago

I don't think so. Yes, LLMs use electricity. But they use electricity in the data-center, not in your home. That's very different, because it's cheap to transfer tokens from the data-center to your home, but it's not cheap to transfer electricity from the data-center to your home. And that matters, because we can build a data-center in a place where there's lots of renewable and hence cheap energy (e.g. from solar or from water/wind).

If you think about it, LLMs are used mostly when people are awake, at least right now. And when is the sun shining? Right. So, build a data-center somewhere where land is cheap and lots of solar panels can be build right next to it. Sure, some other energy source will be used for stability etc., but it won't be as expensive as the energy price for your home.

> This makes me wonder, would it be possible to pre-compue some kind of "rainbow tables" equivalent for LLMs?

Already happening. Read up on how those companies do caching prompt-prefixes etc.

whimsicalism · 17h ago

maybe, but they are not nearly as comparable as you’re making it out to be

MuffinFlavored · 12h ago

> That’s not the case with LLM’s at this moment.

I'd be curious to know how many tokens the average $200/mo user uses and what the cost on their end for it is.

KronisLV · 18h ago

I personally take an issue with them expecting that your usage would be more or less consistent throughout the month. Instead, I might have low usage throughout most of the month and then an 11 hour binge a few days, which in most cases would involve running into rate limits (either that, or just token limitations for inputs).

That's why using the API directly and paying for tokens anything past that basic usage feels a bit nicer, since it's my wallet that becomes the limitation then, not some arbitrary limits dreamed up by others. Plus with something like OpenRouter, you can also avoid subscription tier related limits like https://docs.anthropic.com/en/api/rate-limits#rate-limits

Though for now Gemini 2.5 Pro seems to work a bit better than Claude for my code writing/refactoring/explanation/exploration needs. Curious what other cost competitive options are out there.

tqwhite · 15h ago

This is my strategy as well. I definitely have surges of usage.

Except for one catastrophic binge where I accidentally left Opus on for a whole binge (KILL ME!!!), I use around $150/month. I like having the spigot off when I am not working.

Would the $100/month plan plus API for overflow come out ahead? Certainly on some months. Over the year, I don't know. I'll let you know.

bugglebeetle · 18h ago

Gemini 2.5 Pro is a better coding model, but Gemini CLI is way behind Claude Code, perhaps because the model itself isn’t well-tuned for agentic work. If you’re just doing targeted refactoring and exploration, you can copy and paste back and forth from the web app for $20 a month.

virtualritz · 5h ago

Not if you write Rust. From regularly producing code that has unbalanced braces or quote characters somewhere to destroying well working code. It also easily gets into loops where it can't solve a problem, comes up with a half-working solution A, throws it away, comes up with B, then C, then goes back to A etc.

I run Gemini Pro from within CC but I only use it for analysis and planning for which it is better than Claude (Opus).

I guess if your target language is Python or JS/TS etc., your milage may be considerably better.

For Rust it's simply not true.

KronisLV · 16h ago

I mostly use RooCode nowadays, which works well enough with both Claude and Gemini and other models, even locally hosted ones. Decoupling the LLM vendor from the tools might miss out on some finer features, but also gives me a little bit more freedom, much like how you can also do with the Copilot plugins and Continue.dev and a few others.

Note: all of them sometimes screw up applying diffs, but in general are good enough.

ewoodrich · 17h ago

Gemini 2.5 Pro made some big post-launch improvements for tool calling/agentic usage that made it go from “pulling teeth” to “almost as smooth as Claude” in Cline/Roo Code (which is saying something since Cline was originally built around Claude tool use specifically).

So the team at least seems to be aware of its shortcomings in that area and working to improve it with some success which I appreciate.

But you are correct that Gemini CLI still lags behind for whatever reason. It gets stuck in endless thought loops way too often for me, like maybe 1/15 tasks hits a thought loop burning API credits or it just never exits from the “Completing task, Verifying completion, Reviewing completion, Assessing completion status…” phase (watching the comical number of ways it rephrases it is pretty funny though).

Meanwhile I’ve only had maybe one loop over a period of a couple months using Gemini 2.5 Pro heavily in Roo Code with the most recent version so it seems like an issue with the CLI specifically.

jjani · 10h ago

Even just a week ago Gemini was still outputting the same message twice almost every time in Cline, I doubt that has changed in the last week.

j45 · 14h ago

Can anyone help compare a cost comparison between Gemini 2.5 pro vs Claude Code on a plan or API?

Jcampuzano2 · 18h ago

My opinion is all of these tools should completely get rid of the "pay 20/month, 200/month", etc just to get access to some beholden rate limited amount that becomes hard to track.

Mask off completely and just make it completely usage based for everyone. You could do something for trial users like first 20 (pick your number here) requests are free if you really need to in order to get people on board. Or you could do tiered pricing like first 20 free, next 200 for X rate, next 200 for X*1.25 rate, and then for really high usage users charge the full cost to make up for their extreme patterns. With this they can still subsidize for the people who stay lower on usage rates for market share. Of course you can replace 200 requests with just token usage if that makes sense but I'm sure they can do the math to make it work with request limits if they work hard enough.

Offer better than open-router pricing and that keeps people in your system instead of reaching for 3rd party tools.

If your tool is that good, even with usage based it will get users. The issue is all the providers are both subsidizing users to get market share, but also trying to prohibit bad actors and the most egregious usage patterns. The only way this 100% becomes a non-issue is usage based for everything with no entry fee.

But this also hurts some who pay a subscription but DONT use enough to account for the usage based fees. So some sales people probably don't like that option either. It also makes it easier for people to shop around instead of feeling stuck for a month or two since most people don't want multiple subs at once.

ebiester · 56m ago

Then, however, they would be accountable for how many times AI fails.

If I'm paying a flat rate, the only economic cost I am worrying about is "will this be faster than me doing it myself if it fails once or twice?"

If I am paying per token, and it goes off for 20 minutes without solving the problem, I've just spent $$ for no result. Why would I even bother using it?

For something like Claude Code, that's an even more concerning issue - how many background tasks have to fail before I reach my monthly spending limit? How do I get granular control to say "only spend 7 dollars on this task - stop if you cannot succeed." - and I have to write my own accounting system for whether it succeeds or fails.

vineyardmike · 17h ago

> My opinion is all of these tools should completely get rid of the "pay 20/month, 200/month", etc just to get access.

I think that you should just subscribe to a preset allotment of tokens at a certain price, or a base tier with incremental usage costs for models that aren’t tiny (like paid per minute “long distance calling”).

I use an LLM tool that shows the cost associated with each message/request and most are pennies each. There’s a point where the friction of paying is a disincentive to using it. Imagine you had to pay $0.01 every time you Google searched something? Most people would never use the product because trying to pay $0.30/mo for one day a month of usage is annoying. And no one would want to prepay and fund an account if you weren’t familiar with the product. No consumer likes micro transactions

No one wants to hear this, but the answer is advertising and it will change the game of LLMs. Once you can subsidize the lowest end usage, the incentive for businesses to offer these $20 subscriptions will change, and they’d charge per-usage rates for commercial users.

troupo · 16h ago

> you should just subscribe to a preset allotment of tokens at a certain price

The problem is that there's no way to gauge or control token usage.

I have no idea why Claude Code wrote that it consumed X tokens now, and Y tokens later, and what to do about it

CodeBrad · 18h ago

I think Claude Code also already has the option to provide an API key directly for usage based pricing.

I'm a fan of having both a subscription and a usage based plan available. The subscription is effectively a built in spending limit. If I regularly hit it and need more value, I can switch to an API key for unlimited usage.

The downside is you are potentially paying for something you don't use, but that is the same for all subscription services.

tqwhite · 15h ago

I use API but think about getting the $100/mo plan and using API for overflow if it occurs.

But I have slow months and think that might not actually be the winner. Basically I'm going to wait and see before I sign up for auto-pay.

raincole · 13h ago

Giving how expensive Claude Code is if you use API key, I think it's safe to assume the subscription model is bleeding money out.

Filligree · 2h ago

Claude is also really expensive compared to every other model.

Maybe that reflects higher underlying costs. Maybe their API prices are just inflated.

bananapub · 17h ago

> Mask off completely and just make it completely usage based for everyone.

you can already pay per token by giving Claude Code an API key, if you want.

thus, the subtext of every complaint on this thread is that people want "unlimited" and they want their particular use to be under whatever the cap is, and they want it to be cheap.

Wowfunhappy · 13h ago

No, I'm explicitly not saying that! I'm saying that I'd really like the rolling window to be less than a full week, because that's such a long time to wait if I exhaust the limit!

jononor · 17h ago

Investors love MRR/ARR, so I do not think we will see that as the main option anytime soon. That said, you can use the Claude API to get usage-based billing.

nine_k · 7h ago

Resources that are "Unlimited" in marketing speak are rate-limited in practice. Your unlimited internet connection limits your daily transfer by bandwidth, both at your port and at the remote service ports. Your daily amount of SMS is limited by sending rate. Your all-you-can-eat restaurant order is limited by your belly.

No wonder that access to an expensive API which is an LLM is also rate-limited.

What does surprise me is that you can't buy an extra serving by paying more (twice the limit for 3x the cost, for instance). Either subscriptions don't make enough money, or their limits are at their datacenters and they have no spare capacity for premium plans.

FanaHOVA · 7h ago

You can pay more. It's unlimited (sorta) through API at API pricing.

thorum · 18h ago

The long term is unlimited access to local LLMs that are better than 2025’s best cloud models and good enough for 99% of your needs, and limited access to cloud models for when you need to bring more intelligence to bear on a problem.

LLMs will become more efficient, GPUs, memory and storage will continue to become cheaper and more commonplace. We’re just in the awkward early days where things are still being figured out.

pakitan · 18h ago

I'm often using LLMs for stuff that requires recent data. No way I'm running a web crawler in addition to my local LLM. For coding it could theoretically work as you don't always need latest and greatest but would still make me anxious.

data-ottawa · 17h ago

That’s a perfect use case with MCP though.

My biggest issue is local models I can run on my m1/m4 mbp are not smart enough to use tools consistently, and the context windows are too small for iterative uses.

The last year has seen a lot of improvement in small models though (gemma 3n is fantastic), so hopefully it’s only a matter of time.

qiller · 18h ago

I'm ok using a limited resource _if_ I know how much of it I am using. The lack of visible progress towards limits is annoying.

blalezarian · 18h ago

Totally agree with this. I live in constant anxiety not knowing how far into my usage I am in all the time.

steveklabnik · 18h ago

npx ccusage@latest

I'm assuming it'll get updated to include these windows as well. Pass in "blocks --live" to get a live dashboard!

data-ottawa · 18h ago

Oh wow, this showed me the usage stats for the period before ccusage was installed, that’s very helpful especially considering this change.

ETA: You don’t need to authenticate or share your login with this utility, basically zero setup.

mtmail · 18h ago

Package page (with screenshot) https://www.npmjs.com/package/ccusage

bravura · 18h ago

Does ccusage (or claude code with subscription) actually tell you what the limits are or how close you are too them?

steveklabnik · 18h ago

https://ccusage.com/guide/live-monitoring

See that screenshot. It certainly shows you when your 5 hour session is set to refresh, in my understanding it also attempts to show you how you're doing with other limits via projection.

flkiwi · 18h ago

It's not exactly the same thing, but imagine my complete surprise when, in the middle of a discussion with Copilot and without warning, it announced that the conversation had reached its length limit and I had to start a new one with absolutely no context from the current one. Copilot has many, many usability quirks, but that was the first that actually made me mad.

jononor · 17h ago

ChatGPT and Claude do the same. And I have noticed that model performance can often degrade a lot before such a hard limit. So even when not hitting the hard limit, splitting out to a new session can be useful. Context management is the new prompt engineering...

stronglikedan · 16h ago

The craziest thing to me is that it actually completely stopped you in your tracks instead of upselling you on the spot to continue.

mvieira38 · 18h ago

You can't really predict usage of output tokens, too, so this is especially concerning

qiller · 15h ago

Like when Claude suddenly decides it's not happy with a tiny one-off script and generates 20 refined versions :D

andix · 19h ago

I guess you need to get used to it. LLM token usage directly translates to energy consumption. There are also no flat fee electricity plans, it doesn't make any sense.

idunnoboutthat · 19h ago

that's true of everything on the internet.

andix · 18h ago

Yes, but for most things it's not significant.

For example Stack Overflow used to handle all their traffic from 9 on-prem servers (not sure if this is still the case). Millions of daily users. Power consumption and hardware cost is completely insignificant in this case.

LLM inference pricing is mostly driven by power consumption and hardware cost (which also takes a lot of power/heat to manufacture).

Twirrim · 18h ago

> For example Stack Overflow used to handle all their traffic from 9 on-prem servers (not sure if this is still the case). Millions of daily users. Power consumption and hardware cost is completely insignificant in this case.

They just finished their migration to the cloud, unracked their servers a few weeks ago https://stackoverflow.blog/2025/07/16/the-great-unracking-sa...

jononor · 17h ago

Would have loved to get some more insights. Cost estimates, before and after, for example. But also if any architectural changes where needed, or what kind of other challenges and learnings they got from the migration.

tracker1 · 18h ago

An "AI" box with a few high end gpu/npu cards takes more energy in a 4u box than an entire rack of commodity hardware takes. It's not nearly comparible... meaning entirely new and expansive infrastructure costs to support the high energy. That also doesn't count the needs for really high bandwidth networking to these systems. Not to mention the insanely more expensive hardware costs.

The infrastructure and hardware costs are seriously more costly than typical internet apps and storage.

NicuCalcea · 15h ago

> I really am not a fan of having to feel like I'm using a limited resource whenever I use an LLM

Well, it is a limited resource, I'm glad they're making that clear.

hn_throwaway_99 · 14h ago

Yeah, this sentence "I really am not a fan of having to feel like I'm using a limited resource whenever I use an LLM" felt like "I'm not a fan of reality" to me.

Lots of things still have usage-based pricing (last I checked no gas stations are offering "all you can fill up" specials), and those things work out fine.

NicuCalcea · 14h ago

These subscriptions only work because lighter users subsidise heavier users, but I guess the really heavy users are such big outliers that the maths isn't working out for Anthropic. I'm really not liking this neo-rentier capitalism we find ourselves in.

jm4 · 17h ago

Blame the idiots who abused it. Like that guy who posted a video a couple weeks ago where he had like 6 instances going nonstop and he was controlling it with his voice. There was some other project posted recently that was queuing up requests so that you could hit the limits in every time block. I've seen reddit posts where people were looking for others to share team accounts. It's always the morons who ruin a good thing.

Unless/until I start having problems with limits, I'm willing to reserve judgment. On a max plan, I expect to be able to use it throughout my workday without hitting limits. Occasionally, I run a couple instances because I'm multitasking and those were the only times I would hit limits on the 5x plan. I can live with that. I don't hit limits on the 20x plan.

jjani · 10h ago

Such abusers are very rarely a whole 5% of accounts, almost certainly <=2%.

xtracto · 15h ago

This is actually so fascinating to me. I remember when we had metered & very expensive long distance calls, "metered" dial-up Internet (Todito Card!), then capped DSL internet, then metered Mobile calls, SMSs and then Metered Mobile internet (that last one we still do).

The stuff that we do now, my 13 year old self in 1994 would never dream of! When I dialed my 33.6kbps modem and left it going the whole night, to download an mp3.

It's exciting that nowadays we complain about Intelligent Agents bandwidth plans!! Can you imagine! I cannot imagine the stuff that will be built when this tech has the same availability as The Internet, or POTS!

WD-42 · 17h ago

Try running one locally and observe as the temperature in your office rises a few degrees and the lights dim with every prompt. I didn’t really get the pricing myself until I got a desktop to do local inference. There’s a reason why these companies want to build nuclear plants next to their data centers.

furyofantares · 18h ago

The Sonnet usage does not really look limited at 240-480 hours per week (a week has 168 hours in it).

Opus at 24-40 looks pretty good too. A little hard to believe they aren't losing a bunch of money still if you're using those limits tbh.

clharman · 18h ago

Pretty sure they are still losing money on it, which is great for us. And these limits wouldn't even be happening if there weren't people bragging about having their CC running constantly for 30 hours writing 2 million lines of (doubtless bad) code. And sharing accounts to try to get even MORE usage. It's all that swarm guy tbh and he's proud of it.

wrs · 18h ago

You can run multiple instances of Claude Code at the same time.

furyofantares · 17h ago

I know, I do it all the time! I wasn't calling out 240 hours as being impossible to hit in 168 hour weeks. I suppose "does not really look limited" could be read multiple ways - I did not mean it's literally unlimited, just that it doesn't look very limited.

You can make your own comparison to however many hours you usually spend working in a week and how many sessions you have active on average during that time.

j45 · 14h ago

The calculation of hours is a little tough to imagine sometimes, is it the inference time itself, or the period of time used? Is there an average token cost per hour of use (average or explicit?)

Filligree · 1h ago

It had better be inference time; I regularly have Claude call out to tools that take hours to run.

smcl · 6h ago

The problem is that this company is haemhorraging money and cannot possibly offer an unlimited plan.

pluto_modadic · 4h ago

*cannot possibly subsidize an unlimited plan and must course correct on pricing to reflect cost-plus-pricing instead.

smcl · 4h ago

Last year they had $900 million in revenue and ended up losing $5.6 billion. I suspect cutting off a few whales isn't enough to reverse that and they're gonna need to "course correct" a bit further

raincole · 13h ago

Unlimited plans = the users who use it least subsidize the users who use it a lot.

I don't really know how it's sustainable for something like SOTA LLMs.

gedy · 19h ago

I get it, and feel the same way but the current LLMs are very resource intensive. To the point that I'm reluctant to go all in on these tools if in future we get rug-pulled once companies admit "okay, this was not sustainable at that price.."

dust42 · 5h ago

I am really afraid of this as well. When using one of the plugins for vscode, I would easily use a few million tokens a day. Thus with Claude Code I assume it isn't much different under the hood. The prices on openrouter are roughly $5/M for the better models. Therefore paying $20/month can't be sustainable.

Once enough developers are addicted to AI assisted coding the VCs will inevitably pull the rug.

I wonder if Alibaba will put out a 100B A10B coder model which could probably run for $0.5/M while giving decent output. That would be easily affordable for most developers/companies.

andix · 18h ago

Some people claim we already reached peak-LLM. It's cheap and powerful right now, in the future it might just get more expensive, or worse quality for the same price.

belter · 18h ago

The real bottleneck isn’t Jevons paradox, it’s the Theory of Constraints. A human brain runs on under 20 W, yet every major LLM vendor is burning cash and running up against power supply limits.

If anything pops this bubble, it won’t be ethics panels or model tweaks but subscription prices finally reflecting those electricity bills.

At that point, companies might rediscover the ROI of good old meat based AI.

alwillis · 15h ago

At that point, companies might rediscover the ROI of good old meat based AI.

That’s like saying when the price of gasoline gets too high, people will stop driving.

Once a lifestyle is based on driving (like commuting from the suburbs to a job in the city), it’s quite difficult and in some cases, impossible without disrupting everything else.

A gallon of gas is about 892% higher in 2025 than it was in 1970 (not adjusted for inflation) and yet most people in the US still drive.

The benefits of LLMs are too numerous to put that genie back in the bottle.

We’re at the original Mac (128K of RAM, 9-inch B&W screen, no hard drive) stage of LLMs as a mainstream product.

belter · 13h ago

> when the price of gasoline gets too high

People get electric cars or public transport....

Nemo_bis · 7h ago

Indeed

> Adjusting for long-term ridership trends on each system, seasonal effects, and inertia (the tendency for ridership totals to persist from one month to the next), CBO estimates that the same increase of 20 per- cent in gasoline prices that affects freeway traffic volume is associated with an increase of 1.9 percent in average system ridership. That result is moderately statistically significant: It can be asserted with 95 percent confidence that higher gasoline prices are associated with increased ridership.

https://www.cbo.gov/sites/default/files/110th-congress-2007-...

dotancohen · 18h ago

  > good old meat based AI.

NI, or really just I.

Though some of us might fall into the NS category instead.

TeMPOraL · 18h ago

Where is this oft-repeated idea coming from? Inference isn't that expensive.

belter · 15h ago

My back of envelope estimate, is that even a partly restricted plan, would need to cost roughly $4,000–$4,500 per month just to break even.

margalabargala · 17h ago

Meat has far higher input requirements for good performance above raw energy

belter · 4h ago

Hire Vegan Developers... :-)

hkt · 18h ago

I suspect for this reason we are going to see a lot of attempts at applied AI: I saw an article semi-recently about an AI weather forecasting model using considerably less power than it's algorithmic predecessor, for instance. The answer is, as ever, to climb the value chain and make every penny (and joule) count.

sergiotapia · 19h ago

Think of an insane number of requests. Now 20x it, that's what the top 1% of Claude users are at. Just fleecing the service dry. hard problem, what else could Claude do tbh

handfuloflight · 19h ago

...maybe use their superintelligent AI to come up with a solution that specifically targets the abusers?

SatvikBeri · 18h ago

...like adding limits that only affect a small fraction of users?

Yossarrian22 · 18h ago

Is 1/20 small?

dotancohen · 18h ago

I'm Jewish. We take that right off the top shortly after birth.

dom96 · 15h ago

Easy, they just gotta hit up the AI on each request with a prompt like "You are an AI that detects abuse, if this request is abusive block it" /s

blitzar · 16h ago

Claude says - The key is maintaining user agency—let them choose how to manage their usage rather than imposing arbitrary cutoffs.

It suggests:

Transparent queueing - Instead of blocking, queue requests with clear wait time estimates. Users can choose to wait or reschedule.

Usage smoothing - Soft caps with gradually increasing response times (e.g., 2s → 5s → 10s) rather than hard cutoffs.

Declared priority queues - Let users specify request urgency. Background tasks get lower priority but aren't blocked.

Time-based scheduling - Allow users to schedule non-urgent work during off-peak hours at standard rates.

Burst credits - Banking system where users accumulate credits during low usage periods for occasional heavy use.

volleygman180 · 15h ago

Agreed. The internet would be livid if Apple Music or Hulu limited how many hours you were allowed to stream per week. Especially the users who pay for the top-tier packages that include 4K (or lossless for music), extra channel add-ons, etc.

beepbooptheory · 18h ago

I'm sure people hate this mindset here, but any time I use an LLM I just picture the thousands of fans spinning, the heat of the datacenter... I treat each prompt like a painful interval where I am leaving my door open on a hot day.

I know nobody else really cares.. In some ways I wish I didn't think like this.. But its at this point not even an ethical thing, its just a weird fixation. Like I can't help but feel we are all using ovens when we would be fine with a toasters.

vlan0 · 18h ago

Do you wonder why that feeling arising inside of you?

strictnein · 18h ago

Confused on the Max 5x vs Max 20x. I'm on the latter, and in my email it says:

> "Most Max 20x users can expect 240-480 hours of Sonnet 4 and 24-40 hours of Opus 4 within their weekly rate limits."

In this post it says:

> "Most Max 5x users can expect 140-280 hours of Sonnet 4 and 15-35 hours of Opus 4 within their weekly rate limits."

How is the "Max 20x" only an additional 5-9 hours of Opus 4, and not 4x that of "Max 5x"? At least I'd expect a doubling, since I'm paying twice as much.

deviation · 4h ago

This makes sense if we compare compute cost instead of hours.

Transformer self-attention costs scale roughly quadratically with context window size. Servicing prompts in a 32k-token window uses much more compute per request than in an 8k-token window.

A Max 5× user on an 8k-token window might exhaust their cap in around 30 hours, while a Max 20× user on a 32k-token window will exhaust theirs in about 35 to 39 hours instead of four times as long.

If you compact often, keep context windows small etc, I'd wager that your Opus 4 consumption would approach the expected 4× multiplier... In reality, I assume the majority of users aren't clearing their context windows and just letting the auto-compact do it's thing.

Visualization: https://codepen.io/Sunsvea/pen/vENyeZe

thomasfromcdnjs · 14h ago

Would love more feedback on this, I will definitely downgrade from Max 20x if it is the case. Cost me $350 a month in Australia...

akmarinov · 17h ago

I upgraded to 20x because i was constantly running against Opus limits and now it seems the 20x is almost equal to the 5x in that regard

lvl155 · 17h ago

This is why I stopped using the MAX. Downgraded to Pro and started using o3 and others via API. I really don’t need that many hours to game plan in the beginning. At most it will cost me $10 between o3, Gemini, and Opus per project. There are new model releases every couple of weeks and I’d hate to get stuck with just one provider.

ImaCake · 9h ago

The ambiguity here is awful marketing practice. This bitter pill would be much easier to swallow if it was a hard number instead of these vague ranges. It would serve Anthropic better too - telling people they only get 300hrs vs between 240-480 (which they will naturally evaluate as 240hrs) will mean less users leaving the platform.

dawnerd · 9h ago

They really need to to just a limit so you can see how much you've used, not some vague hours per week or whatever. Github copilot will tell you, you have 300 requests with sonnet a month, makes it really easy to know when you're blowing past without having to worry about how long something has run.

yobid20 · 13h ago

Someone should do a study then file a class action if their marketing material is false.

gabriel666smith · 11h ago

I've been tracking usage in my first month of "20x" Max (which was, unfortunately, this month). Depending on how this usage is amortised (working days, which is what matters to me, or 5 hour periods, or I guess, now weeks..?) their marketing material has been varying degrees of false. This has ranged from 'a bit false' to 'extremely false'.

That is true both on a relative scale ("20x") compared to my previous use of the $20 plan, but - more dishonestly, in my opinion - absolutely false when comparing my (minimal, single-session, tiny codebase) usage to the approximate usage numbers quoted in the marketing materials. The actual usage provided has regularly been 10% of the quoted allowance before caps are hit.

I have had responses from their CS team, having pointed this out, in the hope they would _at least_ flag to users times that usage limits are dramatically lower so that I can plan my working day a little better. I haven't received any sort of acknowledgement of the mismatch between marketing copy and delivered product, beyond promised "future fixes". I have, of course, pointed out that promised and hypothetical future fixes do not have any bearing on a period of paid usage that exists in the past. No dice!

I'm, unfortunately, a UK customer, and from my research any sort of recourse is pretty limited. But it has - without question - been one of the least pleasant customer experiences I've had with a company in some time, even allowing for Anthropic experiencing extremely high-growth.

Claude Code Router has been a Godsend for my usage level. I'm not sure I can justify the time and effort to care and pursue Anthropic's functional bait-and-switch offering more than I already have, because being annoyed about things doesn't make me happy.

But I completely second this: it's not acceptable to sell customers a certain amount of a thing - then and deliver another - and I hope US customers (who I believe should have more recourse) take action. There are few other industries where "it's a language and compute black box!" would be a reasonable defence, and I think it sets a really bad precedent going forward for LLM providers.

One might imagine that Anthropic's recent ~$200m US gov contract (iirc) might allow for a bit of spare cash to, for example, provide customers with the product they paid for (let alone refund them, where necessary) but that does not seem to be the case.

It makes me sad to see a great product undermined like this, which is, I think, a feeling lots of people share. If anyone is actually working towards wider recourse, and would find my (UK) usage data useful, they're very welcome to get in touch.

foota · 17h ago

You're paying for prioritization during high traffic periods, not for 2x usage.

strictnein · 17h ago

That's not what they claim:

https://www.anthropic.com/pricing

   > Max
   > Choose 5x or 20x more usage per session than Pro*
   > Higher output limits for all tasks
   > Priority access at high traffic times

That first bullet pretty clearly implies 4x the usage and the last one implies that Max gets priority over Pro, not that 20x gets priority over 5x.

foota · 15h ago

That is sort of what it implies, but I don't think that's what's actually happening on the backend. I was looking at this yesterday though and I agree that it's all a bit hand-wavy. I feel for them somewhat though because it's hand-wavy because it's a difficult problem to solve. They're essentially offering spot instances.

jjani · 10h ago

It's not "sort of what it implies" - it's literally what it says.

> Choose 5x or 20x more usage per session than Pro*

If a recruiter tells you you'll be getting "20x more money per hour" at this new startup, and you go there and you get only 6x, you're going to have a very different tone than "you sort of implied 20x".

jjani · 8h ago

Or go here: https://claude.ai/upgrade/max. What does it say on top, on the actual pricing page?

Max plan

5x more usage than Pro $100.00/month + tax

Save 50% 20x more usage than Pro $200.00/month + tax

Especially with the "save 50%", if they're not actually offering 4x that of 5x, that's easily illegal false advertising in half the territories Anthropic's customers are located in.

foota · 8h ago

I want to say again that I don't think their plan pricing is straightforward, but (at least when I was looking at it the other day) I came away with the (correct, imo) impression that the 5x and 20x were just marketing terms and I should take it with a grain of salt. I agree it's not literally what it sounds like.

I think the disconnect here is that the 5x or 20x is true within a single session (and you'll see their website seems to always say this, clearly their legal team went over it with a fine tooth comb). The above about weekly quotas etc., isn't within a single session so the 5 or 20x no longer applies.

buzzerbetrayed · 14h ago

Some say hand wavy where others say dishonest. You’re justifying their dishonesty because telling the truth would cost them customers.

Gross.

serf · 19h ago

200 bucks a month isn't enough. Fine. Make a plan that is enough so that I will be left alone about time limits and enforced breaks.

NOTHING breaks flow better than "Woops! Times up!"; it's worse than credit quotas -- at least then I can make a conscious decision to spend more money or not towards the project.

This whole 'twiddle your thumbs for 5 hours while the gpus cool off' concept isn't productive for me.

'35 hours' is absolutely nothing when you spawn lots of agents, and the damn thing is built to support that behavior.

Aurornis · 17h ago

> '35 hours' is absolutely nothing when you spawn lots of agents, and the damn thing is built to support that behavior.

I wouldn't call "spawning a lot of agents" to be a typical use case of the personal plan.

That was always in the domain of switching to a pay as you go API. It's nice that they allowed it on the fixed rate plans, but those plans were always advertised as higher limits, not unlimited.

nojs · 16h ago

It’s literally recommended in “Best practices”: https://www.anthropic.com/engineering/claude-code-best-pract...

ankit219 · 18h ago

API has fewer limits, and practically limitless. Claude is also on Aws and gcp, where you get more quotas (probably credits as well) and different rate limits.

ChadMoran · 18h ago

This. Optimize for the good actors, not the bad ones.

chomp · 18h ago

Just use the API

wahnfrieden · 10h ago

Just buy multiple Max subscriptions.

AstroBen · 10h ago

It's quite possible changing plans to something that's enough to make them profitable would push everyone to a competitor

Slowly bringing up prices as people get dependent sounds like a pretty decent strategy if they have the money to burn

samrus · 10h ago

Changing plans? They could just keep the old plans and add a new one, like OAIs 2k/month plan

nojito · 19h ago

Use the API.

strictnein · 18h ago

The API is far more expensive. For Opus 4 it's almost priced in a way that says "don't use this".

chomp · 18h ago

That’s not what the parent commenter asked though, they wanted a price for not being concerned about limits. The API pricing is that.

johnpaulkiser · 18h ago

I doubts thats what they want. They want a static fixed price, $5k a month for example and never have to think about it.

qeternity · 17h ago

Take the API and assume 24/7 usage (or whatever working hours are). That’s your fixed cost.

It’s more likely that this sum is higher than they want. So really it’s not about predictability.

camgunz · 4h ago

The way these work is they're net profitable given all users, so you have to recategorize users in one of two ways:

- a user subsidizing other users

- a user subsidized by other users

I don't know what OP prefers, but given that people are saying "woof, API pricing too expensive", it sounds like the latter.

The problem, of course, is the provider has to find a market where the one sustains the other. Are there enough users who would pay > $200/mo without getting their money's worth in order to subsidize users paying the same rate, but using more than the average? I think the non-existence of a higher-tier plan says there probably isn't, but I don't want to give too much credence to markets, economics, etc.

paxys · 17h ago

Even if you used the API 24x7 for a single session (no parallel requests) I doubt you'd be able to hit $5k/mo in usage for Claude 4 Sonnet.

awestroke · 18h ago

I use Claude Code authenticated via the API (Anthropic Console). There's no limits for me. And I also assume API-metered requests are prioritized, so it's faster as well.

data-ottawa · 17h ago

The API does have limits but they’re determined by your monthly spend. I did a trial of tier 1 spend and did hit the limits, but on on tier two spending it was much much better.

https://docs.anthropic.com/en/api/rate-limits#requirements-t...

No comments yet

whalesalad · 13h ago

I use the API. Just pay-per-use. Refill it $100 at a time.

127 · 3h ago

It's strange that everywhere I see people just believing everything they say and blaming the users. It's not as if a big corp has ever lied before...? People are just so gullible.

Using the $20 Pro sub and for anything above Hello World project size, it's easy to hit the 5 hour window limit in just 2 hours. Most of the tokens are spent on Claude Code own stupidity and its mistakes quickly snowballing.

matltc · 3h ago

"You're just prompting it wrong. Did you:

1. set up your dozens of /\.?claude.*\.(json|md)/i dotfiles? 2. give insanely detailed prompts that took longer to write than the code itself? 3. Turn on auto-accept so that you can only review code in one giant chunk in diff, therefore disallowing you to halt any bad design/errors during the first shot?"

> ...easy to hit the 5 hour window limit in just 2 hours

I've had this experience. Sucks especially when you're working in a monorepo because you have client/server that both need to stay in context.

am17an · 19m ago

Asymptotically, prompting is a programming language.

alwillis · 18h ago

From Anthropic’s Reddit account:

One user consumed tens of thousands in model usage on a $200 plan. Though we're developing solutions for these advanced use cases, our new rate limits will ensure a more equitable experience for all users while also preventing policy violations like account sharing and reselling access.

This is why we can’t have nice things.

Aurornis · 17h ago

I worked at a startup that offered an unlimited option.

It's amazing how fast you go from thinking nobody could ever use that much of your service to discovering how many of your users are creatively abusing the service.

Accounts will start using your service 24/7 with their request rating coming at 95% of your rate limiter setting. They're accessing it from a diverse set of IPs. Depending on the type of service and privacy guarantees you might not be able to see exactly what they're doing, but it's clearly not the human usage pattern you intended.

At first you think you can absorb the outliers. Then they start multiplying. You suspect batches of accounts are actually other companies load-splitting their workload across several accounts to stay under your rate limits.

Then someone shows a chart of average profit or loss per user, and there's a giant island of these users deep into the loss end of the spectrum consuming dollar amounts approaching the theoretical maximum. So the policy changes. You lose those 'customers' while 90+% of your normal users are unaffected. The rest of the people might experience better performance, lower latencies, or other benefits because the service isn't being bombarded by requests all day long.

Basically every startup with high usage limits goes through this.

0xbadcafebee · 12h ago

If you launch your service without knowing how much it costs to offer your service at the maximum rate it could be used, then this will definitely happen. Engineering directors need to require performance testing benchmarks and do the math to figure out where the ceiling is. If you happen to be "lucky" enough to scale very fast, you don't want to then bang your customer's heads repeatedly on a ceiling.

SlowTao · 8h ago

Not only startups, when OneDrive (was still SkyDrive at that point) started to offer unlimited online storage, from memory they said there was about 70 users that had over a petabyte of data each on the system.

Essentially people had all their security cameras and PVR units uploading endlessly to the cloud and Microsoft was footing the bill.

Then the 1TB limit came in to stop that.

hahn-kev · 7h ago

This is why we can't have nice things.

It's nice to have an unlimited tier where there's no limit but you get your hand slapped when you go beyond reasonable. But people abuse shit like this and now lawyers have to get involved and we can't have the nice thing anymore.

tomwphillips · 17h ago

I think it might actually be because they're selling services at a loss.

TrackerFF · 7h ago

It reminds me of how many landlords (where I live) would offer electricity included in rent, years ago, for the smaller apartments.

Worked great for years, decades even, until crypto miners caught on - and maxed out the usage. Ruined it for the other 99.99% of renters.

eldenring · 16h ago

I don't understand why the current setup for rate limits wouldn't be sufficient to stop this kind of thing.

Tokumei-no-hito · 17h ago

guy was bragging about it on twitter yesterday. $13,200 of spend for his $200 account. he said he had like 4-5 opus only agents running nonstop and calling each other recursively.

clearly that's abusive and should be targeted. but in general idk how else any inference provider can handle this situation.

cursor is fucked because they are a whole layer of premium above the at-cost of anthropic / openai etc. so everyone leaves goes to cc. now anthropic is in the same position but they can't cut any premium off.

you can't practically put a dollar cap on monthly plans because they are self exposing. if you say 20/mo caps at 500/mo usage then that's the same as 480/500 (95%) discount against raw API call. that's obviously not sustainable.

there's a real entitled chanting going on too. i get that it sucks to get used to something and have it taken away but does anyone understand that just the cap/opex alone is unsustainable let alone the RD to make the models and tools.

I’m not really sure what can be done besides a constant churn of "fuck [whoever had to implement sustainable pricing], i'm going to [next co who wants to subsidize temporarily in exchange for growth]".

i think it's shitty the way it's playing out though. these cos should list these as trial periods and be up front about subsidizing. people can still use and enjoy the model(s) during the trial, and some / most will leave at the end, but at least you don't get the uproar.

maybe it would go a long way to be fully transparent about the cap/op/rdex. nobody is expecting a charity, we understand you need a profit margin. but it turns it from the entitled "they're just being greedy" chanting to "ok that makes sense why i need to pay X to have 1+ tireless senior engineers on tap".

const_cast · 11h ago

> clearly that's abusive and should be targeted.

You can't abuse a company by buying their services and using them to their own terms and conditions. The T&C is already stacked against you, you're in a position of no leverage.

The correct solution is what Anthropic is doing here - change the T&C so you can make money. If you offer unlimited stuff, people will use it... unlimitedly. So, don't let them call your bluff.

Tokumei-no-hito · 11h ago

we differ on the opinion that you can be abusive without breaking ToS. perhaps a charitable view is that this type of [abuse || acceptable use] helps lawyers stay employed so they can [eliminate exploitation of || more adequately describe] their ToS.

const_cast · 11h ago

IMO abuse requires an exercise of power. End-users have no power - they hold zero leverage over the contract, and they have zero room to negotiate. It's a fully take-it-or-leave-it deal, and pray they do not alter the deal further.

Because of that, IMO end-users can't abuse the contract, no matter how hard they try. It's not on them to do that, because they have zero control over the contract. It's a have-your-cake-and-eat-it-too problem.

Anthropic simultaneously retains complete control of the contract, but they want to "outsource" responsibility for how it's used to their end-users. No... it's either one or the other. Either you're in complete control and therefore hold complete accountability, or you share accountability.

Tokumei-no-hito · 10h ago

it's not a court of law.

end users did have power. the power to use the service legitimately, even as a power user. two choices were possible, with the users given the power to decide:

1. use it for an entire 8 hour workday with 1-2 agents at most - limited by a what a human could possibly entertain in terms of review and guidance.

2. use for 24 hours a day, 7 days a week with recursive agents on full opus blast. no human review could even be possible with this much production. its the functional equivalent of one person managing a team of 10-20 cracked engineers on adderall that pump out code 24 hours a day.

the former was the extreme of a power user with a practical deliverable. the latter is a circus whose sole purpose is to push the bounds and tweet about it.

now the lawyers get some fresh work to do and everyone gets throttled. oh and that 2nd group? they'll be, and are, the loudest about how they've been "rug pulled just like cursor".

Tokumei-no-hito · 10h ago

there's a famous quote that i think captures the spirit of what i'm trying to express

"you're not wrong, you're just an asshole" - the dude to walter.

(no particular offense directed, the you here is of course the "royal you").

const_cast · 10h ago

I just fundamentally am resistant to calling the "little people" the assholes.

Look, in my view, Anthropic made a mistake. And that's okay, we all do.

But I'm not going to let a multi-billion dollar company off the hook because some nobodies called them out on their bluff. No, Anthropic made the mistake, and now they're fixing it.

Ultimately, this came out of greed - but not the greed of the little people. Anthropic chose aggressive pricing because, like all somewhat large corporations, they usually opt for cheating instead of winning by value. What I mean is, Anthropic didn't strive for The Best product, they instead used their capital as collateral to sell a service at loss to squeeze competitors, particularly small, non-incumbent ones.

And, that's fine, it's a legitimate business strategy. Walmart does it, Amazon does it, whatever. But if that backfires, I don't care and I won't extend sympathy. Such a strategy is inherently risky. They gambled, people called their bluff, and now they're folding.

Tokumei-no-hito · 10h ago

i get it, fuck the "man" and so forth.

I’m not suggesting you be sympathetic to anthropic. I’m suggesting sympathy for people who were using it legitimately, such as myself and others in areas where $200/mo is an extraordinary commitment, and we're not blind but appreciative to their subsidizing the cost.

the core of my position is, was it necessary for people to use it wastefully because they could? what was gained from that activity? sticking it to that greedy corporation? did it outweigh what was lost to the other 95%+ of users?

i don't think we're debating from compatible viewpoints. i maintain it's not wrong, just abusive. you maintain it's not wrong, it is [was] allowed. so be it.

the party's over anyways. the result is an acceleration on the path of normalizing the true cost of usage and it's clear that will unfortunately, or maybe justifiably in your eyes, exclude a lot of people who can't afford it. cheers man.

No comments yet

Aurornis · 17h ago

> guy was bragging about it on twitter yesterday. $13,200 of spend for his $200 account. he said he had like 4-5 opus only agents running nonstop and calling each other recursively.

Do you have a link?

I'm always curious to see these users after working at a startup that was the target of some creative use from some outlier customers.

Tokumei-no-hito · 10h ago

sorry i went looking but couldn't find it. asked grok to search too. it wasn't creative use imo, it was extremism for the sake of attention since as far as i could tell they weren't producing anything (i'm sure they would have told everyone). although to be fair, the recursive opus chaining could be considered creative but only if it had a practical application.

not the tweet but here's a leaderboard of claude clowns bragging about their spend. maybe you can find their handles and ask them what MRR they hit spending $500k (not a typo) in credits.

https://www.viberank.app/

what · 9h ago

There’s zero chance the top rankings are real. 13B tokens in two days?

Tokumei-no-hito · 7h ago

something tells me the viberanks was vibecoded. the numbers clearly don't make sense, nor are they consistent. top guy is 13B and 3rd place is 93B? those token counts are preposterous but also don't equate at all in dollar cost.

who knows, just something i came across when trying to find the twitter thread.

jjmarr · 16h ago

If VCs want to give me free money in exchange for using their product, do you expect me to say no?

Tokumei-no-hito · 10h ago

of course not. the buffet said all you can eat, didn't it? i don't expect you to do anything less but engorge your body to the point of hospitalization while 95% of the remaining customers look in horror.

ookblah · 8h ago

okay? why not just ban or create a special pricing for the 1% of less of users obviously abusing it. i get that they had to do this, but to frame it as some kind of community benefit is a little disingenuous. we know you're operating at a loss and trying to figure out a path forward.

pointing to the most extreme example as if you can't stop it in it's tracks is a bad argument. its like saying we will now restrict sending of emails for everyone because this one spammer was sending 1000x the amount of an avg or even power user when you should just be solving the actual problem (identifying and stopping those that disrupt).

xdennis · 8h ago

>> preventing policy violations like account sharing and reselling access.

> This is why we can’t have nice things.

We're living in the worst world that Stallman could have predicted. One in which even HN agrees that people shouldn't be allowed to share or resell what they pay for.

baggachipz · 1h ago

It turns out that maybe "sell at a loss and make up for it in volume" may not be a solid business strategy. The bait-and-switch continues in the AI bubble.

jjcm · 19h ago

They need metered billing for their plans.

All AI companies are hitting the same thing and dealing with the same play - they don't want users to think about cost when they're prompting, so they offer high cost flat fee plans.

The reality is though there will always be a cohort of absolute power users who will push the limits of those flat fee plans to the logical extremes. Startups like Terragon are specifically engineered to help you optimize your plan usage. This causes a cat and mouse game where they have to keep lowering limits as people work around them, which often results in people thinking about price more, not less.

Cursor has adjusted their limits several times, now Anthropic is, others will soon follow as they decide to stop subsidizing the 10% of extreme power users.

Just offer metered plans that let me use the web interface.

paxys · 19h ago

The API exists. You can generate a token and use Claude Code with it directly, no plan needed.

tough · 19h ago

then why sell fake -unlimited- plans to hook people up

It lasted less than a week -unlimited- been a shit show cutting down since then

Aurornis · 17h ago

They never advertised the plan as unlimited Opus usage, as far as I know.

paxys · 19h ago

If 95% of users are under the limit then it isn't a "fake" plan.

Dylan16807 · 17h ago

That really depends. Like, if Opus can't make it through a full work week then at least for Opus the unlimited is pretty darn fake even if 95% of people are under that.

I'm reminded of online storage plans with various levels of "unlimited" messaging around them that can't even hold a single medium to large hard drive of data. Very few users hit that, most of whom don't even have a hard drive they regularly use, but it means they shouldn't be going anywhere near the word "unlimited".

paxys · 16h ago

They never advertised "unlimited", just higher limits

tough · 19h ago

fake for 5% of users.

will they refund me my sub?

when I subbed It was unlimited, they've rugged the terms twice already since then in less than a month

paxys · 18h ago

> Starting August 28

Read the announcement. You are getting a full month's notice. If you don't like the limits, don't renew your subscription. Of course that doesn't help if your primary goal is to be an online outrage culture warrior.

geodel · 17h ago

True. If it were easy to pirate these warriors would be claiming policy change as reason to pirate.

No comments yet

Aurornis · 17h ago

> when I subbed It was unlimited,

Where did you see unlimited usage? The Max plan was always advertised as higher limits, not unlimited usage.

Tokumei-no-hito · 17h ago

you were rugged? you sincerely expected you could run parallel opus agents 24/7 for $200/mo? who did the rugging here? did it occur to you that paying $7/day for a 24/7 team of dedicated senior engineers, roughly being paid 30 cents an hour, was not sustainable?

yes it was unlimited. so is the public water fountain. but if you show up and hold the button down to run nonstop while chanting "it says unlimited free water doesn't it??" you must expect that it will no longer be unlimited.

we went from reasonably unlimited, which 95% of users enjoyed, respected and recognized was subsidized, to no unlimited anymore because 5% wanted to abuse it. now you can scream about being rugged, just like you did for cursor, and jump to the next subsidized provider that you can abuse until there's none left. you do realize that every time "unlimited" gets abused it raises the standard of limits and pricing across the board until it becomes normalized. this was going to happen anyways on a longer timeframe where providers could optimize inference and models over time so the change wasn't so shocking, but abuse accelerated it.

energy123 · 6h ago

Bait and switch.

bravesoul2 · 15h ago

It's 1990s shared hosting again!

richwater · 18h ago

> Just offer metered plans that let me use the web interface.

The problem is this would reveal how expensive it _actually_ is to service interference right now at the scale that people use it for productive things.

throwdbaaway · 14h ago

Another problem is that it works like a slot machine -- sometimes the code is good, most of the time the code is mediocre and full of bugs.

Last Friday I spent about $15 in 1 hour using claude code with API key, and the code doesn't really work, even though all the unit tests passed. I am not going to touch it for weeks, while the loss is fresh in my mind.

With a subscription though, you can keep on gambling, until you get a hit.

OtherShrezzing · 17h ago

This email could have been a lot more helpful if it read “in the following months, your account entered one of these rate limits: Aug 2024, Jan 2025, May 2025” or similar.

I have no idea if I’m in the top 5% of users. Top 1% seems sensible to rate limit, but top 5% at most SaaS businesses is the entire daily-active-users pool.

clbrmbr · 43m ago

Anthropic has been incredibly generous. I use regularly ~750 USD worth of opus tokens per month, which is a great deal for 200 USD. I’ve never hit a limit on the Max 20x plan, but the Max 5x plan was laughably limited. The impression I got was that there was basically no limiting at all, and Anthropic was just watching the usage patterns.

It’s an all you can eat buffet, you’re just not allowed takeout!

rybosworld · 57m ago

I've gotten some very good use out of LLM's outside of standard U.S. work hours, but I often find that they are quite awful at being helpful coding assistants during my work day. I assume this is due to users competing for resources.

My issue is: a request made during peak usage is treated the same as a request made during low usage times even though I might not be able to get anything useful/helpful out of the LLM during those busy hours.

I've talked with coworkers and friends who say the same.

This isn't a problem with Claude specifically - seems to happen with all the coding assistants.

swalsh · 18h ago

That's fine, please make it VERY CLEAR how much of my limit is left, and how much i've used.

kaztal · 4h ago

I just can’t believe how people are fine with spending $200 per month for such an unclear product. Like, what am I buying here? It’s less concrete than a frying pan, less renting a movie, even less than a monthly subscription of Adobe programs. Is easily the most confusing high value product discussed on this site.

loufe · 17h ago

Seriously. I still find it ridiculous that even after they upped Opus' limit from 60% to 80% they don't show usage % below that. It's sapping my ability to use it quickly on the 5x plan.

No comments yet

nobodywillobsrv · 4m ago

A related question: has anyone looked into secondary markets for services like this and rate limit "sharing"? Legals, technicals etc.

rstupek · 19h ago

"... and advanced usage patterns like running Claude 24/7 in the background" this is why we can't have nice things

serial_dev · 19h ago

All of these AI services tell everyone how amazing AI is, it can run things, solve things on its own, while the developers are drinking coffee or sleeping. Some developers could actually do that with the service they paid for, fully in agreement with the terms and now it is their fault?

OtherShrezzing · 16h ago

Anthropic put out a press release over the weekend describing the internal team’s hints and tips to make CC useful. The 2nd tip was “run it in a bunch of different features at once”.

furyofantares · 18h ago

Many such people have been in HN threads bragging about having servers running 24/7 and how they're getting $10,000 worth of compute (based on API pricing) for $200 per month. If anyone doing that is surprised that it wasn't going to last, then lmfao.

ohdeargodno · 18h ago

"they paid for"

$100 doesn't even cover the electricity of running the servers every night, they were abusing a service and now everyone suffers because of them.

serial_dev · 18h ago

It is still not the users fault, pricing is not their responsibility. As a user, I check the price and what the service offers, then I subscribe and I use it. If these users did something illegal or breaking some conditions, any service would be free to block them. But they didn’t, meaning the AI tools promised too much for the price so they update their conditions, they are basically figuring out the pricing.

I don’t know what is there to be mad about, and using dramatic language like “everyone suffers because of them”

currymj · 17h ago

a lot of "all you can eat" restaurants have to charge extra for uneaten food. there are people who just enjoy the feeling of abundance they get from paying a flat fee and then wasting something.

This is clearly what was happening with the most extreme Claude Code users, because it's not actually that smart yet and still requires a human to often be in the loop.

However, Anthropic can't really identify "wasted code".

jedberg · 15h ago

Tragedy of the commons. You are totally right, they didn't violate any policy. But they violated their moral obligation to not abuse a shared resource.

const_cast · 11h ago

It's not a public good - these people weren't shitting in the park. It's a paid-for service and they were paying customers, getting their money's worth.

The price simply did not reflect the cost, and that's a problem. It happens to a lot of business and sometimes consumer's call their bluff. Whoops.

You wanna cheat and undercut competitors by shooting yourself in the foot with costs that exceed price? Fine. It's a tale as old as time. Here, have your loss lead - xoxo, every consumer.

Just charge per unit.

jedberg · 11h ago

I never said it was a public good. I said it was a shared resource.

The tragedy of the commons is the concept that, if many people enjoy unfettered access to a finite, valuable resource, such as a GPU farm, they will tend to overuse it and may end up destroying its value altogether.

That is exactly what happened here. The price was fine if everyone upheld their moral obligation not to abuse it.

const_cast · 11h ago

There is no moral obligation, only the terms and conditions. That's your actual obligation.

There's only one person who made a mistake here - Anthropic. They purposefully make their terms and conditions bad, and then when people played by the contract they set forth, they lost money. It's calling a bluff.

Anthropic purposefully priced this far too aggressively to undercut their competitors. Companies with stupid amounts of investor capital do that all the time. They flew too close to the sun.

You can't create a contract, have all the power in the world to rig the contract in your favor, and then complain about said contract. Everyone was following the rules. The problem was the rules were stupid.

To be more specific - abuse requires an exercise of power. End-users have no power at all. They have literally zero leverage over the contract and they have no power to negotiate. They can't abuse anything, they're too weak.

48terry · 11h ago

Just in case everybody in this comment tree forgot: Claude is not some common, public good. It barely even qualifies as a digital commons, if it does. It is a private tool owned by a private, for-profit company. Nobody has a common obligation to make Anthropic profitable or to reduce its expenses.

jedberg · 11h ago

I never said anything about a public good. See my sibling comment.

48terry · 10h ago

The "finite, valuable resource" in this case being "something a private company is actively trying to produce and pocket wealth with".

Again, there is no moral obligation to ensure Anthropic's business goes well and conveniently.

Tokumei-no-hito · 17h ago

depends on your view of collectivist vs individualistic.

if your actions are defined by legal ToS then no, they didn't do anything wrong. they paid, it's the company's fault for not expecting someone to use 50-100x a reasonable usage.

if your actions are defined by ethical use then you understood that 50-100x the use would inevitably lead to ruining the party for everyone.

it's like a buffet. everyone pays a flat price and can enjoy a big meal for themselves. maybe sometimes having a few extra pieces of cake here and there. until someone shows up and starts stacking plates well beyond any reasonable expectation (not rule based) of a buffet customer. what's the result? stringent rules that are used for legal rather than rational enforcement.

it's obvious that even "reasonable use" is being subsidized, and the company was okay with doing so. until we have people running 10 opus instances in a glutinous orchestra of agents just because they can. now the party is over. and i'd love to know what these claude agencies were even producing running 24/7 on opus. i can't imagine what human being even has the context to process what 24/7 teams of opus can put out. much like i can't imagine the buffet abuser actually enjoying the distending feast. but here we are.

Aurornis · 17h ago

> I don’t know what is there to be mad about, and using dramatic language like “everyone suffers because of them”

Why are you assuming everyone will suffer?

They backtested the new limits on usage data and found it will begin to impact less than 5% of users.

Dylan16807 · 18h ago

Any good numbers on what it costs? I can look up how many watts a GPU needs but I don't know how the batching is typically done to understand how many users are sharing those watts.

But a compute-focused datacenter is probably not paying more than 10 cents per kWh, so $100 would pay for more than a 24/7 kilowatt of GPU plus cooling plus other overhead.

Modified3019 · 19h ago

Yeah that part made me laugh. Clearly the work of Benevolent World Exploders trying to hasten the heat death of the universe.

taylorbuley · 19h ago

I imagine this was not surprising. This had to have been well-considered by the teams in the first round of pricing. I'm guessing they just didn't want it to be a blocker for release and the implementation is now catching up.

bad_haircut72 · 18h ago

They set the pricing how is this even wrong - I will run my claude subscription non stop until they cut me off, I paid for it they should honor what they sold. And yes Im a max subscriber who still frequently hits limits

volkk · 19h ago

i mean, this is exactly how price discovery works. if you give loose usage requirements, you'll have actors who take full advantage of it. not on the people using it but ultimately on the company that pretends they can sustain something like this, and then claw back the niceties

404mm · 19h ago

I got the same email (for my Pro account). And all the limits they set have nothing to do with their reason for setting them. Pro is so limited already that people “running 24/7” is a total nonsense.

manveerc · 18h ago

For most people, this is a tool we use daily. What’s the reasoning behind choosing a weekly usage limit instead of a daily one? Is it because the top 5 percent of users tend to have spiky usage on certain days, such as weekends? If that’s the case, has there been any consideration of offering different usage tiers for weekdays and weekends?

I’m just curious how this decision came about. In most cases, I’ve seen either daily or monthly limits, so the weekly model stood out.

globular-toast · 6h ago

It's to make you use it less on Monday for fear of losing it by Friday.

0xbadcafebee · 13h ago

Possibly dumb suggestion, but what about adaptive limits?

Option 1: You start out bursting requests, and then slow them down gradually, and after a "cool-down period" they can burst again. This way users can still be productive for a short time without churning your servers, then take a break and come back.

Option 2: "Data cap": like mobile providers, a certain number of high requests, and after that you're capped to a very slow rate, unless you pay for more. (this one makes you more money)

Option 3: Infrastructure and network level adaptive limits. You can throttle process priority to de-prioritize certain non-GPU tasks (though I imagine the bulk of your processing is GPU?), and you can apply adaptive QoS rules to throttle network requests for certain streams. Another one might be different pools of servers (assuming you're using k8s or similar), and based on incoming request criteria, schedule the high-usage jobs to slower servers and prioritize faster shorter jobs to the faster servers.

And aside from limits, it's worth spending a day tracing the most taxing requests to find whatever the least efficient code paths are and see if you can squash them with a small code or infra change. It's not unusual for there to be inefficient code that gives you tons of extra headroom once patched.

low_tech_punk · 7h ago

Definitely something telecom is doing. People seems to be ok with their “unlimited” plans that throttle after a usage cap. Actually curious how the economics of those plans work out.

lucb1e · 17h ago

> affecting less than 5% of users

Probably phrased to sound like little but as someone used to seeing like 99% (or, conversely, 1% down) as a bad uptime affecting lots and lots of users, this feels massive. If you have half a million users (I have no idea, just a ballpark guess), then you're saying this will affect just shy of the 25 thousand people that use your product the most. Oof!

_boffin_ · 17h ago

Reminds me of gym membership utilization rates. You have something like 50% not even going. A large % left only go a few times a month…. Yada yada

lucb1e · 17h ago

Exactly! And then you alienate the top 10% fans among the members that ever go (since 10% of those 50% is 5%). They must know this is terrible for the brand so I guess there is a real good financial reason for doing this

(Congrats on 777 karma btw :). No matter the absolute number on sites like these, I always still love hitting palindromes or round numbers or such myself)

kelnos · 2h ago

I don't have a problem with companies adding usage limits and whatnot, but it's shady to do for existing customers who have pre-paid for some amount of time. If I pay the yearly up-front amount, I expect my terms of use to stay the same for that entire year.

bluelightning2k · 3h ago

Counter-take: this is a good thing.

Seems like some people are account-sharing or scripting/repackaging to such an extent that they were able to "max out" the rate limit windows.

Ultimately - this all gets priced in over time; whether that's in a subscription change or overall rate limit change, etc.

So if you want to simply use it as intended, over time stopping this kind of pattern is better for us?

steve_adams_86 · 19h ago

I'm well within the 95%. I might lack an imagination here, but... What are you guys doing that you hit or exceed limits so easily, and if you do... Why does it matter? Sometimes I'd like to continue exploring ideas with Claude, but once I hit the limit I make a mental note of the time it'll come back and carry on planning and speccing without it. That's fine. If anything, some time away from the slot machine often helps with ensuring I stay on course.

throwup238 · 18h ago

Opus + extended thinking + deep research = 3-5 messages/reports per five hour limit. That’s the fastest way I’ve found to blow through the Pro plan.

Some stuff I’ve used it for in the last day: figuring out what a family member needs for FAFSA as a nontraditional student, help identify and authenticate some rare first editions and incunabula for a museum collection I volunteer at, find a list of social events in my area (based on my preferences) that are coming up in the next week (Chatgpt Agent works surprisingly well for this too), adapting Directus and Medusa to my project’s existing schema and writing up everything I need to migrate, and so on.

Deep research really hits the Claude limits hard and that’s the best way to avoid hallucinations when asking an important question or making it write complex code. I just switch from Claude to ChatGPT/Gemini until the limits reset but Claude’s deep research seems to handily beat Gemini (and OpenAI isnt even in the running). DR queries take much longer (5-10 min in average) but have much more in depth and accurate answers.

steve_adams_86 · 18h ago

I hadn't considered that. I'm using it almost exclusively to validate logic, kind of like a fuzzer in nature ("What if we need to do this with this logic/someone tries to do that/what am I missing/etc"), or to fill in specifications ("what feature would compliment this/what could be trimmed to achieve MVP more easily/does this spec appear to be missing anything according to this set of requirements"), which requires a lot of review, and using more expensive models like Opus doesn't appear to provide meaningfully better results. After prompting it, I typically have a lot to think about and the terminal goes quiet, or I prompt it on a related matter that will similarly require my eyes and brain for long enough that I won't be able to limit out.

I can see how work involving larger contexts and deeper consideration would lead to exhausting limits a lot faster though, even if you aren't using it like a slot machine.

theshrike79 · 18h ago

"find a list of social events in my area"

Isn't this something you can do with a simple Google search? Or Perplexity?

No need to shove by far the most expensive LLM (Claude Opus 4) at it.

throwup238 · 18h ago

Not for the Los Angeles metro area. There isn’t a single calendar or event aggregator that covers the entire area and with an LLM I can give it complex schedules (i.e. a dump of my calendar for that week) and preferences to filter the list of events for the stuff I like, including vague stuff like “I like country music in the style of ‘Take Me Home, Country Roads’ but not modern country radio”.

theshrike79 · 7h ago

That sounds like a startup for me.

Collate all the LA Metro area events from different sources and whip up an app or web site where people can filter them and subscribe to the events in Google Calendar or in .ical format.

You can even have Claude vibe code it for you :)

tkiolp4 · 15h ago

Killing a fly with a cannonball.

throwup238 · 14h ago

Doesn’t really matter when the marginal cost of the cannonball is effectively zero - I’m already paying the monthly subscription.

Then not using the canonball is just a waste of time, which is a heck of a lot more valuable than some purist aversion to using LLMs to save time and effort.

steve_adams_86 · 14h ago

One could argue this is like paying a subscription for gasoline and saying you better use it up or it's a waste. There's an externality at play.

I know LLMs aren't as much of an environmental scourge as people sometimes make them out to be, but if they're used eagerly and aggressively, their impacts certainly have a capability of scaling in concerning ways.

Zopieux · 14h ago

Gosh I so despise this new normal. Just when I thought I could fight bloat and unnecessary tech in my own tiny corner of the world, only for a few to ruin it with ridiculous LLM (ab)use.

xboxnolifes · 13h ago

Where is the fly swatter at?

Terretta · 17h ago

When you say not even in the running, is that including Deep Research on o3-pro?

throwup238 · 12h ago

I haven't tried o3-pro, but my fundamental problem with ChatGPT Deep Research is that it only searches for a few dozen sources, whereas Claude and Gemini regularly use 400+ sources.

zarzavat · 18h ago

I agree. I'm on the base plan, yet to hit any limits. The bottleneck is my ability to review the code it writes, and to write prompts detailed enough for the output to be useful to me.

I assume that the people hitting limits are just letting it cycle, but doesn't that just create garbage if you don't keep it on a tight leash? It's very eager but not always intelligent.

steve_adams_86 · 18h ago

I think this is it. They use it like a slot machine, and when something isn't quite what they wanted, they provide broad instructions to refine and do better. Progress is slow and questionable, but anticipation and (sometimes) reward is increased.

The issue could be, in part, that a lot of users don't care to be efficient with token usage and maintaining condensed, efficient, focused contexts to work with.

Tokumei-no-hito · 17h ago

i wonder how many are negligent vs ignorant. negligence would be senior engineers that could scope and work with context properly but are lazy or don't care. ignorance would be vibe coders that genuinely can't express anything beyond plain english and superficial descriptions of issues and changes.

loufe · 17h ago

Switching to Opus is an eye-opening experience. You hit limits often, and need to get creative to avoid burning through limits, but the difference is seriously impressive. You'll waste a lot less time with dead ends and bad code.

zarzavat · 10h ago

The issue (with Sonnet, I'm not using Opus), is not always that the code is bad per se, but merely that it doesn't solve the problem in the way I expected.

I have two problems with that. Firstly, I want my code to be written a particular way, so if it's doing something out of left field then I have to reject it on stylistic grounds. Secondly, if its solution is too far from my expectation, I have to put more work into review to check that its solution is actually correct.

So I give it a "where, what, how" prompt. For example, "In file X add feature Y by writing a function with signature f(x: t), and changing Z to do W..."

It's very good at following directions, if you give it the how hints to narrow the solution space.

mendor · 18h ago

I've found that asking for deep research consumes my quota quite fast, so If I run 2 or 3 and normal use I hit the limit and have to wait to reset

steve_adams_86 · 18h ago

Me too. I've also found that even when trying to restrict models meant for these tasks, they tend to go on tangents and waste tremendous amounts of tokens without providing meaningfully better outputs. I'm not yet sold on these models for anything outside of fuzzy tasks like "does this logic seem sound?". They tend to be good at that (though they often want to elaborate excessively or propose solutions excessively).

SaucyWrong · 18h ago

One way I've seen personally is that folks are using tools that drive many Claude Code sessions at once via something like git-worktree as a way of multitasking in a single codebase. Even with garden-variety model use, these folks are hitting the existing 5-hourly rate limits routinely.

steve_adams_86 · 18h ago

I use this approach because I like to work on features or logical components in isolation and then bring them together. I still can't limit out most of the time because I need to actually look at the outputs and think about what I'm building. At the moment I have 3 directories in my work tree. Sometimes I prompt in more than one at a time, especially at interfacing code, but that could mean 30–90 minutes of reviewing and implementing things in each directory. Over a work day I apparently send an average of ~40 messages according to `claude --resume`

bad_haircut72 · 19h ago

Im not a formula 1 driver but why do they have those big padel things on the back? looks dumbo IMHO I just dont get it

steve_adams_86 · 18h ago

I respectfully consider this analogy void, but welcome an explanation of why I'm wrong.

I haven't yet seen anyone doing anything remarkable with their extensive use of Claude. Without frequent human intervention, all of it looks like rapid regression to the mean, or worse.

No comments yet

flashgordon · 18h ago

Ok I really really have to figure out how to have a local setup of the open-source LLMs. I know i know - the "fixed costs" are high. But I have a strong feeling being able to setup local LLMs (and the rig for it) is the next build-your-own-PC phase. All I want is a coding agent and the grunt power to run it locally. Everything else Il build (generate) with it.

I see so many folks claiming crazy hardware rigs and performance numbers so no idea where to begin. Any good starting points on this?

(Ok budget is TBD - but seeing a you get X for $Y would atleast help make an informed decision).

colonCapitalDee · 18h ago

You should consider self-hosting in the cloud. When you start coding run a script that spins up a new VM and loads the LLM of your choice, then run another script to spin it back down when you're done. For intermittent use this works great and is much cheaper than buying your own hardware, plus it's future proof. It does admittedly lack the cool factor of truly running locally though.

flashgordon · 18h ago

Yeah this is the setup I am thinking for now as it is all the "Freedom" with only hardware dependence. Wierdly enough I noticed Qwen3 (coder) was also almost same price as opus 4 which was wierd.

cpursley · 18h ago

Qwen pricing on fireworks.ai is pretty good

paxys · 18h ago

You can build a decent rig for yourself with:

- 2x 4070 Ti (32 GB total VRAM) - $2200

- 64 GB RAM - $200-250

- Core i9/Ryzen 9 CPU - $450

- 2 TB SSD - $150

- Motherboard, cooler, case, PSU - $500-600

Total - ~$3500-3700, say $4000 with extras.

flashgordon · 18h ago

wow - do you mind sharing any links to a specific setup? Also whats the biggest model anybody has run on this?

paxys · 17h ago

You can run a decent model on it, say highly quantized Qwen or Deepseek R1 getting 5-10 tokens/sec output, but it will be nothing in comparison to a commercial offering like Claude, o3 or Gemini. For that you need a datacenter-class GPU going for $50K-100K a pop.

mtkd · 5h ago

But a small collective running that box, especially spanning timezones, could potentially be a viable alternative or will be soon -- with obv privacy gains too

icelancer · 17h ago

Every model you run on that setup will be at best half as good as Sonnet 4.

lossolo · 18h ago

Unfortunately, you will not be able to run any model on this that is comparable to the Claude models.

richwater · 18h ago

If you're okay with lower quality output, a $10k Mac Studio will get you there. But you _will_ have to accept lower quality outputs compared to todays' frontier models.

OtherShrezzing · 16h ago

>But you _will_ have to accept lower quality outputs compared to todays' frontier models.

I'm curious how much lower quality we're talking about here. Most of the work I ever get an LLM to do is glue-code, or trivial features. I'd expect some fine-tuned Codestral type model with well focused tasks could achieve good performance locally. I don't really need worlds-leading-expert quality models to code up a hamburger menu in a React app & set the background-color to #A1D1C1.

gnator · 18h ago

Has anyone tried running with a tenstorrent card? Wanted to see how they fare

flashgordon · 18h ago

Yeah I was actually thinking about a proper rig - My gut feel is a rig wouldnt be as expensve as a mac and would actually have a higher ROI (at the expense of portability)?

My other worry about the mac is how unupgradable it is. Again not sure how fruitful it is - in my (probably fantasy land) view if I can setup a rig and then keep updating components as needed - it might last me a good 5 years say for 20k over that period? Or is that too hopeful?

So for 20K over 5 years or 4k per year - it comes to about 400 a month (ish). The equivalent of 2 MAX pro subscriptions. Let us be honest - right now with these limits running more than 1 in parallel is going to be forbidden.

if I can run 2 claude level models (assuming the DS and Qwens are there) then I am already breaking even but without having to participating in training with all my codebases (and I assume I can actually unlock something new in the process of being free).

lossolo · 18h ago

Buy 4–8 used 3090s (providing 96–192 GB of VRAM), depending on the model and weight quantization you want to run. Used 3090 costs around $800. Add more RAM to offload layers if needed. This setup currently offers the best value for performance.

https://www.reddit.com/r/LocalLLaMA/comments/1iqpzpk/8x_rtx_...

You can look for more rig examples on that subreddit.

esskay · 17h ago

I do wonder what the ongoing cost there would be. The ~$9k hardware cost is an easy thing to quantify, but going with a bank of very hot, power hungry GPU's is going to rack up a hefty monthly bill in many parts of the world.

I imagine theres also going to be some problems hooking something like that up to a normal wall socket in North America? (I like the reddit poster am in Europe so on 220v)

icelancer · 17h ago

It's not too bad - I run 6x RTX 3090s on a 2nd-gen Threadripper with PCIe bifurcation cards. The energy usage is only really bad if you're training models constantly, but inference is light enough.

I use 208V power but 120V can indeed be a challenge. The USA has split phase wiring; every house has 220-240V if they need it. Bit of a misunderstanding of how our power works - we have 220-240V on tap, but typical outlets are 110-120V.

flashgordon · 16h ago

Yeah at this point the goal is to see how to maximize for inference. For training it is impossible from the get go to compete with the frontier labs anyway. Im trying to calculate (even amortized over 2 years) the daily cost of running the equivalent rig that can get close to a single claude agent performance. (without needing a 6-digit gpu).

flashgordon · 16h ago

Yeah this was what I was doubting too. Like the hardware is one off but how much do you have to modernize your house (lines, cooling, eletrical-fire-safety etc)?

flashgordon · 16h ago

Also I wonder if like the old days you could "try" these out somewhere first. Imaging plonking down 5-10k and nothing works (which is fine if you can get a refund ha).

ed_elliott_asc · 7h ago

Well I’m ecstatic about this, purely because we can tell the “ai will kill all programming jobs by 2026” people that yes ai will remove all programming jobs but you can’t afford to pay for ai for more than two days a week and the rest of the week the ai bots will be idle and refuse to program

knowsuchagency · 17h ago

One feature I would love to have is the ability to switch the model used for a message using a shorthand like #sonnet. Often, I don't want or need opus but I don't want to engage in a 3 step process where I need to:

1. switch models using /model 2. message 3. switch back to opus using /model

Help me help you (manage usage) by allowing me to submit something like "let's commit and push our changes to github #sonnet". Tasks like these rarely need opus-level intelligence and it comes up all the time.

vunderba · 17h ago

Agreed. I was hoping that they would add this (model selection) to the template for defining subagents.

https://docs.anthropic.com/en/docs/claude-code/sub-agents

gardnr · 15h ago

That would be great! I want to plan and analyze with opus but happy to use sonnet for code gen. Sonnet is faster and just as good at codegen. Opus is better at planning a change.

HenriNext · 3h ago

Having 4 separate limits that all are opaque and can suddenly interrupt work is not ok.

We don't know what the limits are, what conditions change the limits dynamically, and we cannot monitor our usage towards the limits.

1. 5 hour limit

2. Overall weekly limit

3. Opus weekly limit

4. Monthly limit on number of 5 hour sessions

data-ottawa · 17h ago

The 5% being ‘abusive’ limit seems high (1/20 users — that feels like an arbitrary cost cut based on customer numbers rather than objective based on costs/profit). I would have much preferred to see a scalpel applied to the abusive accounts than this — and from what I’ve seen those users should be very obvious (I’ve seen posts on Reddit with people running dozens of CC instances 24/7).

I also have to wonder how much Sub Agents and MCP are adding to the use, sub agents are brand new and won’t even be in that 95% statistic.

At the end of this email there a lot of unknowns for me (am I in the 5%, will I get cut off, am I about to see my usage increase now that I added a few sub agents?). That’s not a good place to be as a customer.

raytopia · 17h ago

I was wondering when the free lunch for these tools was going to end. All the AI stuff has been incredibly subsidized by investors and it'll be interesting to see whay the real cost is going to be when companies like Anthropic and OpenAI need to make money.

lucb1e · 16h ago

Wasn't it like 2$ in electricity for every 1$ they take in revenue at OpenAI? I think it was a Flemish podcast where they mentioned that such numbers had leaked (episode was recorded a month ago), hard to find back among a 2-hour podcast episode but as a ballpark figure

sea-gold · 19h ago

Official Anthropic post on Reddit: https://www.reddit.com/r/ClaudeAI/comments/1mbo1sb/updating_...

codethief · 17h ago

Thanks, I had been wondering what the source was.

globular-toast · 6h ago

This had also been sent in email form to subscribers.

QuadmasterXLII · 16h ago

Their business model with the pro plan is to sell a dollar for 80 cents for a while to gain market share. Once they have spent the money allocated to this plan and bring it to a close, don’t expect them to resume it in response to righteous indignation: the money will be gone. See also Uber, MoviePass etc

reasonableklout · 14h ago

Inference costs have been in freefall since ChatGPT[1], so this is different than Uber/MoviePass. The primary cost is a technology which is getting cheaper as more investment is put into algorithm + hardware R&D.

[1]: https://epoch.ai/data-insights/llm-inference-price-trends

pluto_modadic · 4h ago

future hardware costs do not erase losses on existing capex expenditures, if they bought an (overpriced) nvidia GPU and then it turns out local LLMs or a Chinese competitor can do it for much cheaper investors effectively notice the mortgage is underwater. Tech getting cheaper is only handy if your company (e.g. ChatGPT) didn't already make a big gamble they can't sell off (for fear of hurting the cost of the asset they're trying to sell) see also coin "reserves".

what · 8h ago

Just because they’re slashing they’re prices while they compete for users doesn’t mean the cost of inference came down at the same rate or at all.

bravesoul2 · 15h ago

That elusive free lunch.

muzani · 40m ago

I hit the 5 hour limit almost every work day (pro, not max).

It has become a kind of goal to hit it twice a day. It means I've had a productive day and can go on and eat food, touch grass, troll HN, read books.

I'm on Claude Code after hitting Cursor Pro for the month. It makes more sense to subscribe to a bunch of different tools at $20/month than $100/month on one tool that throws overloaded errors. We'll probably get more uptime with the weekly restriction.

mkl · 19h ago

Other thread: https://news.ycombinator.com/item?id=44713837

catigula · 19h ago

Some equivocation here between legitimate 'heavy use', which is obviously relative and actually referenced in this document, and 'policy violations', which are used at the rationale/justification for it.

Finbarr · 4h ago

We're all going to end up with free open source/weight models, running locally, with no usage limits. This is temporary.

aniviacat · 3h ago

Perhaps integrating new information into models will at some point be so efficient that models offered by online services will always know about all new APIs, about every small library, and about every license change. That would give such models a significant advantage over local models, even if at some point good local models would become runnable by anyone.

tough · 19h ago

I cancelled my subscription

I'll keep openAI and they dont even let me use CLI's with it, but they're at least Honest about their offerings.

Also their app doesnt tell you to go fuck off ever, if you're Pro

sauwan · 17h ago

Would be great to see how our previous months usage stacked up and when, if at all, we would have been rate limited.

I'd be pretty surprised if I were to get rate limited, but I do use it a fair amount and really have no feel for where I stand relative to other users. Am I in the top 5%? How should I know?

sngltoon · 3h ago

Darth Viber: "I am altering the deal. Pray I don't alter it any further."

kvthweatt · 2h ago

Fix the web interface to not be so slow. On older laptops other AI models run fine. Claude seems to be running locally, and I see no discussion of this.

mohsen1 · 4h ago

You can use GLM 4.5 instead. It is matching Claude 4

https://openrouter.ai/z-ai/glm-4.5

It's even possible to point Claude Code CLI to it

tiku · 7h ago

I hit some limits in Claude Desktop fairly quick and that made it unusable. Paid for the whole year but you can't get your money back if you cancel. Such a bummer.

slimebot80 · 10h ago

Overall I think this is as positive - protecting the system from being hit heavily 24/7 and with multiple agents from some users might make the system more sustainable for a wider population of users.

This one thing that bugs me is the visibility of how far through your usage you are. Being told when you're close to the end means I cannot plan. I'm not expecting an exact %, but a few notices at intervals (eg: halfway through) would help a lot. Not providing this kinda makes me worry they don't want us to measure. (I don't want to closely measure, but I do want to have a sense of where I am at)

jbrooks84 · 2h ago

I bounce off the daily limit and have to take breaks. This is no bueno for me

PeterStuer · 5h ago

Does this mean they will force heavy users to have multiple accounts rather than being able to extend an account with extra subscriptions?

Who does that benefit? Does number of accounts beat revenue in their investor reports?

wg0 · 17h ago

To be fair - abuse is real. This also shows that "AI" is on "VC ventilators" of greed.

Waiting for higher valuations till someone pulls the trigger for acquisition.

IPOs I don't see to be successful because not everyone gets a conman like Elon as their frontman that can consistently inflate the balloon with unrealistic claims for years.

neom · 17h ago

I'm working on a coding agent for typescript teams and I'm curious how people would like to consume these things generallyin terms of price a predictability. I've thought through a bunch of stuff, not sure what is best... Right now I have a base fee and then a concept of credits, the base fee ($500) includes 10k credits, and the credits are tied to PRs, it works out to 100 "credits" per simple PR and 200 "credits" for a complex PR, Commit is 20 credits. 20 credits are $5. PR reviews are free.

Is this way too complicated? It feels complicated to me and I worked on it, so I presume it is?

I don't want to end up in some "you can work for X number of hours" situation that seems... not useful to engineers?

How do real world devs wanna consume this stuff and pay for it so there is some predictability and it's useful still?

Thank you. :)

OtherShrezzing · 16h ago

That suggested pricing structure is too complicated - especially when it boils down to $5 for a simple PR, and $10 for a big one.

mattnewton · 17h ago

Can I have an option to easily "fall back" to metered spend when hitting these hard limits? I wouldn't mind spending $5-10 on api credits to not interrupt my flow one day, and right now that means switching to another tool or logging out and re-logging in when the rate limit switches back.

eigenvalue · 17h ago

They need to come up with a better way of detecting people who are actively breaking the ToS by using the Max plans as a kind of backdoor API key, because those users are obviously not using it in the way it was intended and abusing the system. Not sure how they would do that, but I'm guessing you could fingerprint the pattern of requests and see that some of the requests don't fit the expected pattern of genuine requests made by the Claude Code client.

Anyway, I've been resigned to this for a while now (see https://x.com/doodlestein/status/1949519979629469930 ) and ready to pay more to support my usage. It was really nice while it lasted. Hopefully, it's not 5x or 10x more.

maCDzP · 1h ago

Who was it that used it 24/7? THIS IS WHY WE CAN’T HAVE NICE THINGS!

shreezus · 17h ago

I wonder if this is related to the capacity/uptime issues Anthropic has had lately. I got quite a lot of errors last week!

Hopefully they sort it out and increase limits soon. Claude Code has been a game-changer for me and has quickly become a staple of my daily workflows.

mkbelieve · 17h ago

I understand, but I also will not pay a subscription fee for limited service. I canceled as soon as I got this e-mail. Too bad I signed up for an annual subscription last month.

This is also exactly why I feel this industry is sitting atop a massive bubble.

OtherShrezzing · 16h ago

As far as the email says, this change is triggered at your next billing cycle. So your account is grandfathered in until next year.

bananapub · 14h ago

> I also will not pay a subscription fee for limited service.

you...already were? it already had a variety of limits, they've just added one new one (total weekly use to discourage highly efficient 24/7 use of their discounted subscriptions).

westonplatter0 · 17h ago

To make it easier for users to know what to expect, can you release a monitor for users to run locally?

I can understand setting limits, and I'd like to be aware of them as I'm using the service rather than get hit with a week long rate limit / lockout.

dsrtslnd23 · 18h ago

the rate limits already were very low - and now they are getting even lower, wow. On a max plan I can use Opus for only a few minutes per day.

vessenes · 13h ago

Do not love. I used opus on api billing for some time before the new larger plans came out, so I switched. I routinely hit the opus limits in an hour or two ($100 plan). There are some tasks sonnet is good with but for many it’s worse, and sometimes subtly so.

Upshot - I will probably go back to api billing and cancel. For my use cases (once or twice a week coding binges) it’s probably cheaper and definitely less frustrating.

low_tech_punk · 7h ago

Why not do tiered pricing like OpenAI api? The more you consume, the more discount, but it’s never unlimited or free, so you can prevent abuse without punishing true demand.

esskay · 17h ago

Starting to realise the business model of LLM's has some serious flaws I see.

jacquesm · 16h ago

I hope that this communication is not typical of the output of Claude, but if it is it should get a prize of sorts for vagueness and poor style. No way for users to find out if they are affected or not, lots of statements that carry zero information. Not impressed, to put it mildly, besides, they should have enforced their limits from day #1 as they were, not allow people to spend 10K worth of resources on a $200 plan. Now they risk those that are not even affected from re-thinking their relationship with the company.

pxc · 15h ago

Could this be in part because many of the recent Chinese models (which seem great, tbh) show signs of having been distilled from one or another Claude models?

Or is that a silly idea, because distillation is unlikely to be stopped by rate limits (i.e., if distillation is a worthwhile tactic, companies that want to distill from Anthropic models will gladly spend a lot more money to do it, use many, many accounts to generate syntheitc data, etc.)?

wdb · 19h ago

Guess the reason why they recently introduced agents? ;) This is not a great change if you ask me. I will have to figure out how badly this affects and if needed just cancel the subscription and find an alternative.

yumraj · 17h ago

> These changes will not be applied until the start of your next billing cycle.

If I’m on annual Pro, does it mean these won’t apply to me till my annual plan renews which is several months away.

mkbelieve · 17h ago

Same boat. Feels like a nice, big bait & switch.

ehnto · 10h ago

I am pretty interested in what the person letting it run 24/7 is achieving. Is it a continuously processing workload of some kind that pipes into the model? Maybe a 24/7 chat support with high throughput? Very curious.

debian3 · 10h ago

Share your account with people on different timezone

LTL_FTC · 17h ago

I have been using Gemini for some time now. I switched away from Claude because I was frustrated with the rate limits and how quickly I seemed to reach them. Last week I decided to give it Claude another try so I resigned up. I linked a personal repository I am working on, prompted it for suggestions on potential refactoring recommendations and hit send. It immediately stopped and said this prompt would reach my limits. Immediately reconsidered my subscription.

0xDEAFBEAD · 6h ago

Why is there no link to an official blogpost? I have no way to verify that this HN post is authentic.

mushufasa · 17h ago

I think they should totally do this but I think they should call it "rate-limiting" instead of "weekly limit". Seems pretty clear to me that the purpose is to avoid situations where people are running 5 background agents 24/7, not the person working during business hours normally. Reframing this makes it more clear this is about bots not users.

aliljet · 18h ago

Maybe this is an unpopular opinion, but it seems like Anthropic has quietly 4x'd the real cost of the Pro plan. There are 168 hours in a week, and if I'm able to (safely) bet on 40 hours of use, realistically, I just lost 75% of the value of the plan.

What are the reasonable local alternatives? 128 GB of ram, reasonably-newish-proc, 12 GB of vram? I'm okay waitign for my machine to burn away on LLM experiments I'm running, but I don't want to simply stop my work and wake up at 3 AM to start working again..

kmac_ · 18h ago

Pro is just a paid demo. I hit the limit all the time on a small project, and I'm not even doing anything weird. The product is still great, though. At work, we checked out a bunch of options, and almost everyone chose something different, so the competition is though.

bananapub · 17h ago

> Maybe this is an unpopular opinion, but it seems like Anthropic has quietly 4x'd the real cost of the Pro plan. There are 168 hours in a week, and if I'm able to (safely) bet on 40 hours of use, realistically, I just lost 75% of the value of the plan.

I think you're just confused about what the Pro plan was, it never included being used for 168 hours/week, and was extremely clear that it was limited.

> What are the reasonable local alternatives? 128 GB of ram, reasonably-newish-proc, 12 GB of vram? I'm okay waitign for my machine to burn away on LLM experiments I'm running, but I don't want to simply stop my work and wake up at 3 AM to start working again..

a $10k mac mini with 192GB of vram with any model you can download still isn't close to Claude Sonnet.

lavezzi · 14h ago

not really true per https://artificialanalysis.ai/?intelligence-tab=coding

computegabe · 18h ago

I think we'll see a lot more contextual engineering efforts soon. It is really inefficient to be uploading your entire codebase pretty much every request, which is what a lot of people are doing. When in reality, very few parts need the full context when programming. Although, big token doesn't seem to care, and often encourages this (including the editors).

cheschire · 17h ago

I could see a front-end / back-end split in the future where a completely on-client LLM is used to trim down the request and context before shoving the request off to the back-end.

mekpro · 5h ago

Is this limit will also count together with Claude Chat ?

christophilus · 19h ago

Hm. I run Claude Code in several containers, though generally only one is active at a time. I wonder if they’ll see that as account sharing?

andix · 19h ago

This is a very common usage pattern, I don't think they will restrict that.

The daily limits are probably there to fix the account sharing issue. For example I wanted to ask a friend who uses the most expensive subscription for work, if I could borrow the account at night and on weekends. I guess that's the kid of pattern they want to stop.

itsalotoffun · 18h ago

No. They're just desperately trying to limit the number of tokens burned per $200/mo account. It's trivial to burn 1-3x that dollar amount per DAY even before you're loaning your account out to friends in different timezones. And as ccusage will show, if you were paying API pricing rates your $200/mo plan would consume closer to $3-5k/mo in "credits".

Somehow you're "not allowed" to run your account 24/7. Why the hell not? Well because then they're losing money. So it's "against their ToS". Wtf? Basically this whole Claude Code "plan" nonsense is Anthropic lighting VC on fire to aggressively capture developer market share, but enough "power users" (and don't buy the bullshit that it's "less than 5%") are inverting that cost:revenue equation enough to make even the highly capitalized Anthropic take pause.

They could have just emailed the 4.8% of users doing the dirty, saying "hey, bad news". But instead EVERYONE gets an email saying "your access to Claude Code's heavily subsidized 'plans' has been nerfed".

It's the bait and switch that just sucks the most here, even if it was obviously and clearly coming a mile away. This won't be the last cost/fee balancing that happens. This game has only gotten started. 24/7 agents are coming.

jdboyd · 19h ago

I think that is a common use case. If you run the containers at the same time, it sounds to me like you will just run into the usage limits more quickly.

discordance · 10h ago

When you ship the container, do you also ship Claude Code and its config with it?

brainless · 15h ago

Tools that generate code will have a lot of competition. It's good that Athropic is refining it's pricing but would have been better if users got to know their exact usage and apply own controls.

Frustrated users, who are probably using the tools the most will try other code generation tools.

anonzzzies · 18h ago

We will see but I hit the limit multiple times a day so I am a bit scared that this would mean looking for alternatives.

garciasn · 18h ago

Using Claude Code or the web UI? If when using Claude Code, you may need to break your codebase up into smaller chunks to help.

That said, there's no fucking way I am getting what they claim w/Opus in hours. I may get two to three queries answered w/Opus before it switches to Sonnet in CC.

anonzzzies · 17h ago

Claude Code cli. Yeah, I gave up on Opus as I deed it switches really fast (200 sub). I made my own flow tooling and prompting which uses way less than it does on its own, but I still hit the limits.

geor9e · 15h ago

>affecting less than 5% of users

Notice they didn't say 5% of Max users. Or 5% of paid users. To take it to the extreme - if the free:paid:max ratio were 400:20:1 then 5% of users would mean 100% of a tier. I can't tell what they're saying.

ComplexSystems · 19h ago

These are limits for the $200/mo plan?

Disposal8433 · 17h ago

There are limits for $200 per month?

CSMastermind · 19h ago

That's the subscription I have and this looks like the email that I got.

steveklabnik · 18h ago

"Max" is the name for both the $100 and $200 plan.

sam_goody · 1h ago

Anthropic has plans such as $150/user and $150/5-users-but-less-hours-per-user. I could not work out what 2 users or heavy-usage-5-users are intended to do.

There are people that will always try to steal, but there may also be those that just don't understand their pricing.

Also some people keep going forever in the same session, causing it to max out - since the whole history is sent in every request. Some prompting about things like that (your thread has gotten long..) would probably save quite a bit of usage and prevent innocent users from getting locked out for a week.

jeswin · 12h ago

I am a Max 20x subscriber, and I'm not unhappy that Anthropic is putting this in place.

Claude is vital to me and I want it to be a sustainable business. I won't hit these limits myself, and I'm saving many times what I would have spent in API costs - easily among the best money I've ever spent.

I'm middle aged, spending significant time on a hobby project which may or may not have commercial goals (undecided). It required long hours even with AI, but with Claude Code I am spending more time with family and in sports. If anyone from Anthropic is reading this, I wanted to say thanks.

koolba · 19h ago

This is like when the all-you-can-eat buffet tells you you're only allowed to go the buffet line once.

gingersnap · 19h ago

No, it's not. It's the all you can eat buffet saying that 95% can eat all they want, but the 5% that keep sneaking food in their backpack to eat when they get can't do that anymore.

koolba · 19h ago

There's no lobster rolls stuffed into a backpack here. It's using the service as it was pitched, an all-you-can-eat buffer of API calls. Anything that limits what that means is scaling back access to that buffet.

If the new limits are anything less than 24 * 7 / 5 times the previous limits then power users are getting shafted (which understandably is the point of this).

What's worse with this model is that a runaway process could chew through your weekly API allotment on a wild goose chase. Whereas before the 5-hour quantization was both a limiter and guard rails.

jononor · 17h ago

It was never unlimited. The 5 hourly limit was there from before.

volkk · 18h ago

No it's not. It's like an all you can eat buffet stating you can eat as much as you want and feed your friends and the homeless outside for a one time fee, and then realizing that the economic model made 0 sense to begin with and need to either state that you're only allowed to eat what you personally can, or increase the price to something that can sustain the amount of food being removed from the buffet.

Eggpants · 18h ago

The fact you believe the 5% number is pretty interesting.

TheServitor · 15h ago

Zero surprise. Some of you were really going nuts out there.

Then again, to scale is human

pembrook · 6h ago

These resource constraints create an extremely strong incentive for customers to try all competitors…this is what makes for the best products/services. When there’s no network effects it’s a fight over algorithms and compute and fundraising ability and we actually get real competition instead of natural monopolies.

This is the most exciting business fight of our time and I’m chomping popcorn with glee.

I think Anthropic is grossly overestimating the addressable market of a CLI tool, while also falsely believing they have a durable lead right now in their model, which I’m not so sure of. Also their treatment of their partners has been…shall we say…questionable. These are huge missteps at a time they should be instead hitting the gas harder imo.

They’re getting cocky. Would love to see a competitor to swoop in and eat their lunch.

mtkd · 5h ago

They need to max valuation before hardware catches up and qwen3-coder can be run locally for free

It's easy to forget the product Anthropic are selling here, and throttling, is based on data they mostly pay little or no content fee for

donperignon · 1h ago

So we are not touching the rate limits for those that doesn’t reach them… that’s passive aggressive behavior in my opinion

Oras · 17h ago

> affecting less than 5% of users based on current usage patterns.

How about adding ToS clause to prevent abuse? wouldn't that be better than having a statement with negative effect on the rest of 95%?

jononor · 17h ago

That just gives your team another thing to do, trying to police a ToS clause. And one would have to define abuse somehow, which is quite tricky. There are a lot of guite legitimate but expensive ways to use such a general thing as LLM compute.

jjice · 16h ago

As part of the 95% here, I'm totally cool with this. I'm just a Pro plan user, but holy shit I hit problems with their service constantly. Claude is my preferred LLM currently, but sometimes during a normal 9-5, I can't use it at all due to outages, which really gets in the way while developing an MCP server.

Anthropic seems like they need to boost up their infra as well (glad they called this out), but the insane over-use can only be hurting this.

I just can't cosign on the waves of hate that all hinges on them adding additional limits to stop people from doing things like running up $1k bills on a $100 plan or similar. Can we not agree that that's abuse? If we're harping on the term "unlimited", I get the sentiment, but it's absolutely abuse and getting to the point where you're part of the 5% likely indicates that your usage is abusive. I'm sure some innocent usage will be caught in this, but it's nonsense to get mad at a business for not taking the bath on the chunk of users that are annihilating the service.

taormina · 16h ago

I feel like I am constantly hitting the 5 hour limits not doing very much. I feel like I will quit using Claude Code outright if my usage is gone by Tuesday.

natch · 17h ago

I wish you could allow some concept of rollover of credits, even if only fractional, for cases where someone has to be away for a few days and the clock is ticking with no usage.

ChadMoran · 17h ago

Okay but when will we get visibility into this other than we're at the 50% of the limit? If you're going to introduce week long limits, transparency into use is critical.

bravesoul2 · 15h ago

I'm the other way around. Below my rate floor for bothering to renew! Work provides AI and don't have too much time to play at the weekend.

browningstreet · 11h ago

I was actually going to $ign up this week. Now I have to study everything before committing.

brenm · 14h ago

This is bad.

I switched to Claude Code because of Cursor’s monthly limits.

If I run out of my ability to use Claude Code, I’m going to just switch back to Cursor and stay there. I’m sick of these games.

If you think it’s ok, then make Anthropic dog food it by putting every employee in the pro plan and continue to tell them they must use it for their work but they can’t upgrade and see how they like it.

vonneumannstan · 17h ago

Apparently people are consistently getting thousands of dollars worth of tokens for their $200/mo sub so this was just obviously unsustainable for Anthropic.

lyu07282 · 1h ago

The 95% of users are subsidizing the 5% power users, in theory they could adjust the usage cap dynamically depending on usage vs. total number of subscribers. But of course with a complete lack of transparency it doesn't matter, there is no reason to ever give the benefit of the doubt to a for-profit company.

vonneumannstan · 45m ago

>But of course with a complete lack of transparency it doesn't matter, there is no reason to ever give the benefit of the doubt to a for-profit company

We know inference is very expensive so it's not reasonable to expect unlimited usage in general...

verelo · 15h ago

Am i missing something? Why don’t people just add an api key to Claude…are the subscription models that much better?

resonious · 14h ago

The subscription models are more cost effective.

epolanski · 7h ago

You can always create a second account, no?

neutrinobro · 12h ago

This was bound to happen at some point, but probably net-on-net won't affect most users. I think it's pretty useful for a variety of tasks, but those tend to fall into a rather narrow category (boilerplate, simple UI change requests, simple doc-strings/summaries), and there is only so much of that work which is required in a month. I certainly won't be cancelling my plan over this change, but so far I also haven't seen a reason to increase it over the hobbyist-style $20/mo plan. When I do run into usage limits, its usually already at the end of the day, or I just pivot to another task where it isn't helpful.

thoughtfulappco · 18h ago

I don't get the idea that using more compute or running a continuous agent is considered "power user". Consuming more =! power users lol

thoughtfulappco · 18h ago

Yes, yes let the normies who wait for their pizza to cook while running one prompt at a time "eat" so to speak mwuahahah #deathtopowerusers

acaloiar · 16h ago

LLM Token and usage limit anxiety aught to pair nicely with battery and range anxiety. All part of a head-healthy diet.

itsalotoffun · 18h ago

Vibe pricing. That's all this is. "Pay us $200/mo and get... access". There's no way to get a real usage meter (ccusage doesn't count). I want an Anthropic dashboard showing "you've used x% of your paid quota". Instead we get vibe usage. Vibe pricing. "Hey pay us money and we'll give you some level of access but like you won't know what, but don't worry only 5% of our users will trip the switches" bullshit. Someone else in this thread nailed it:

> sounds like it affects pretty much everyone who got some value out of the tool

Feels that way.

But compared to paying the so-called API pricing (hello ccusage) Claude Code Max is still a steal. I'm expecting to have to run two CC Max plans from August onwards.

$400/mo here we come. To the moon yo.

Wowfunhappy · 18h ago

> I'm expecting to have to run two CC Max plans from August onwards.

...are you allowed to do that? I guess if they don't stop you, you can do whatever you want, but I'd be nervous about an account ban.

vFunct · 19h ago

Seems like their business plan is unsustainable. What's a sustainable cost model?

Say an 8xB200 server costs $500,000, with 3 years depreciation, so $166k/year costs for a server. Say 10 people share that server full time per year, so that's going to need $16k/year/person to break even, so ~$1,388/month subscription to break even at 10x users per server.

If they get it down to 100 users per server (doubt it), then they can break even at $138/month.

And all of this is just server costs...

Seems AI coding agents should be a lot more expensive going forward. I'm personally using 3-4 agents in parallel as well..

Still, it's a great problem for Anthropic to have. "Stop using our products so much or we'll raise prices!"

jononor · 16h ago

I do not think they are aiming to be cashflow positive now. That might not be possible. Though if it is within range they might want to go for it. Because the stamina needed to win this race is going to be immense. Especially since OpenAI is going hard for scaling up via investor funding, and Google can afford to loose/invest a couple billions annually by diverting from their main revenue sources.

A realistic business plan would be to burn cash for many years (potentially more than a decade), and bank on being able to decrease costs and increase revenue over that time. Investors will be funding that journey.

So it is way too early to tell whether the business plan is unsustainable. For sure the unit economics are going to be different in 5 and 10 years.

Right now is very tough though- since it is basically all early adopter power user types, which spend a lot of compute. Later one probably can expect more casuals, maybe even a significant amount of "gym users" that pay but basically never uses the service. Though OpenAI is currently stronger for casuals, I suspect.

Over the next decade, hardware costs will go down a lot. But they have go find a way to stay alive (and competitive) until then.

Fade_Dance · 19h ago

It seems to me that the business model simply won't be making money until a time when jobs are running on new, more efficient hardware a generation or two from now. Like you said, the numbers just don't work currently. I'm under the impression they want to potentially get to break-even for now.

dcchambers · 3h ago

How long until Anthropic, OpenAI, etc introduce surge pricing for LLM usage?

nbbaier · 18h ago

Will there be a native way to track in Claude Code how close we are to hitting those weekly rate limits?

Bluestein · 17h ago

Gosh, maybe there's something I am not understandinng, but 24/7? Wow.-

PS. Ah! Of course. Agents ...

krisworld11 · 4h ago

I think this is set to happpen. Welp

oidar · 18h ago

How do we know when we are close to hitting these limits? Will there be a way to check that?

JyB · 10h ago

I thought Claude code was pay as you go!?

nurettin · 8h ago

I feel rug pull after rug pull ($10->$20, hourly quotas, weekly quotas) because they can't scale and they aggressively focus on the $200+ customers and limit the lower tier to maximize profits.

Workaccount2 · 14h ago

The pitfalls of being beholden to 3rd parties for hardware.

submeta · 17h ago

This explanation is so vague that it’s hard to take seriously. Anthropic has full access to usage data—they could easily identify abusive users and throttle them specifically. But they don’t. Why? Because it was never really about stopping abuse. The truth is: Anthropic can’t handle the traffic and growth, and now they’re looking for a convenient excuse to limit access and point fingers at so-called “heavy users.”

The problem is, we have no visibility into how much we’ve actually used or how much quota we have left. So we’ll just get throttled without warning—regularly. And not because we’re truly heavy users, but because that’s the easiest lever to pull.

And I suspect many of us paying $200 a month will be left wondering, “Did I use it too much? Is this my fault?” when in reality, it never was.

bananapub · 17h ago

> they could easily identify abusive users and throttle them specifically.

that's exactly what they've done? they've even put rough numbers in the email indicating what they consider to be "abusive"?

submeta · 17h ago

That’s bs. I just switched from 100 EUR a month to 200 EUR plan. Why? Because two days ago Anthropic decided that I had to wait until noon. And I just started my workday and used CC for half an hour.

I use CC 1-3h a day. Three days a week. Am I a heavy user now? Will I be in the 5% group? If I am, who will I argue with? Anthropic says in its mail that I can cancel my subscription.

GiorgioG · 16h ago

This is fantastic news. They're burning through too much cash too fast. They're going to have to sooner or later charge more money at which point businesses will balk at the price and the AI hype cycle will come to an end. I can't wait.

lvl155 · 17h ago

I think it’s hilarious they roll this out right after subagent introduction.

3cats-in-a-coat · 8h ago

If you wanted a more equitable experience for all you wouldn't just limit the high-end users, but return the money to low-end users.

Charging a low flat fee per use and still warning when certain limits hit is possible. But it's market segmentation not to do it. Just charge a flat fee, then lop off the high-end, and you maximize profit.

lerchmo · 12h ago

Why don’t they just offer a $500/m plan?

aeternum · 15h ago

If you're paying per-month why are the limits weekly?

matt_cogito · 8h ago

Let's start with stating, that Opus 4 + Sonnet 4 are a gift to humanity. Or at least to developers.

The two models are not just the best models for coding at this point (in areas like UX/UI and following instructions they are unmatched); they come package with possibly the best command line tool today.

The invite developers to use them a lot. Yet for the first time ever, I can feel how I cannot 100% fully rely on the tool and feel a lot of pressure, when using it. Not because I don't want to pay, but because the options are either:

> A) Pay $200 and be constantly warned by the system that you are close to hitting your quota (very bad UX) > B) Pay $$$??? via the API and see how your bill grows to +$2k per month (this is me this month via Cursor)

I guess Anthropic has the great dilemma now: should they make the models more efficient to use and lower the prices to increase limits and boost usage OR should they cash in their cash cows while they can?

I am pretty sure no other models comes even close in terms of developer-hours at this point. Gemini would be my 2nd best guess, but Gemini is still lagging behind Claude, and not that good at agentic workloads.

ripped_britches · 8h ago

Everybody freaking out about this should just pay for API access like an adult

wdb · 19h ago

Guess they ran into the usage limits themselves when they worked on the messaging in Claude Code: "Claude usage limit reached. Your limit will reset at 8pm (UTC)"

Why not use the user's timezone?

serial_dev · 19h ago

Are you crazy, considering time zones would have burned through their allotted tokens for the week.

SatvikBeri · 18h ago

Plenty of people using CC from multiple timezones, e.g. I use it from my laptop and from EC2 servers.

bravesoul2 · 15h ago

Leaky bucket. No time zone needed.

bananapub · 17h ago

this is a super-american thing, not a AI company thing

thenano2 · 15h ago

Well those who need more than the limits, register a second account and pay for a second subscription... Not the end of the world

sea-gold · 19h ago

"Most Pro users can expect 40-80 hours of Sonnet 4 within their weekly rate limits."

bitdeep · 14h ago

deserved for this 5%, people get out of mind and abuse the service.

Xmd5a · 13h ago

We are the 95%!!

latchkey · 18h ago

A bunch of words to just say:

We're going to punish the 5% that are using our service too much.

famahar · 16h ago

What is the use case for that much usage? Are people mass running vibe code agents? Genuinely curious.

taormina · 16h ago

I have been hitting the 5 hour mark just using Claude Code at all on a very reasonably sized Flutter codebase. I’m relatively concerned but it’s very non-critical but I’m more likely to quit using the tool outright instead of paying them more. I hate how black box it is and this is making that worse.

eshack94 · 7h ago

Lol enshittification begins [continues].

closewith · 19h ago

It was always too good to last. I assume this is the end of the viability of the fixed price options.

jongjong · 10h ago

For the sake of saying something positive on HN, Claude Code is great. I haven't run into any limits yet. My code is quite minimalist and the output of Claude Code is also minimalist, maybe that's why.

If you work on some overengineered codebase, it will produce overengineered code; this requires more tokens.

blinded · 17h ago

Dial back the vibe :)

bachittle · 18h ago

I'm guessing less than 5% of the users are just letting Claude Code run in an autonomous loop making slop. I tried this too: and Opus 4 isn't good enough to run autonomously yet. The Rube Goldberg machine needs to be carefully calibrated.

swalsh · 18h ago

This happens when you prompt it poorly. If you want to avoid slop, the first step is to write an extensive BRD. Read it, understand it, make sure it has everything needed. Then write a solutions architecture document. Read it, understand it, make sure it is fully specified including how things are structured, architecture, principles etc. You can use AI to write these documents. Just make sure to read them, and edit them as needed.

When you have your functional spec, and your tech spec, ask it to implement it. Additionally add some basic rules, say stuff like "don't add any fallback mechanisms or placeholders unlessed asked. Keep a todo of where you're at, ask any questions if unsure.

The key is to communicate well, ALWAYS READ what you input. Reivew, and provide feedback. Also i'd reccomend doing smaller chunks at a time once things get more complicated.

albertgoeswoof · 17h ago

When do you read the code

swalsh · 16h ago

While it's being generated, I'll spot check it, and after I test the code i'll peek in more detail at it. I review the code in much the same way I review code from a human dev. I almost never look closely at ALL lines. I'll do a quick look through just looking to see if anything jumps out, and then for the areas I intuitively know there might be some funny business I'll do a deeper dive into the code.

j45 · 14h ago

Sincerely enjoy and appreciate Claude, feedback based on that:

- It would be nice to know if there was a way to know or infer percentage wise the amount of capacity a user is currently using (rate of usage) and has left, compared to available capacity. Being scared to use something is different than mindful.

- Since usage it can feel a little subjective/relative (simple things might use more tokens, or less, etc) to things beyond a user's usage alone, it would be nice to know how much capacity is left both on the current model and in 1 month now to learn.

- If there is lower "capacity" usage rates available at night vs the day, or just slower times, it might be worth knowing. It would help users who would like to, plan around it, compared to people who might be just making the most of it.

yieldcrv · 14h ago

I only began using Claude because OpenAI was fumbling in my use cases. Whenever their public facing offering was rate limiting, or experiencing congestion, or having UX issues like their persistent "network error" in the middle of delivering a response, then I would go to Claude.

You having the same issue kills the point of using you.

sneak · 14h ago

Why not just have a usage-based pricing system that people can opt in to so that they just pay-as-they-go once they hit these plan limits?

It makes no sense to me that you would tell customers “no”. Make it easy for them to give you more money.

bananapub · 14h ago

they already do, you can give claude code an API key and it charges per token.

this entire thread is people whinging about the "you get some reasonable use for a flat fee" product having the "reasonable use" level capped a bit at the extreme edge.

sneak · 10h ago

No, I mean, a one-click (or no-click) upgrade path. Having to hit a wall and then go get your API key and stuff sounds like a pain. It should just ask you if you want to switch to usage-based pricing when you hit the limit.

bouyaveman6 · 17h ago

Play it fair, and it will be fine

incomingpain · 15h ago

Claude's vague limits is literally why im not a subscriber.

thoughtfulappco · 18h ago

Let the normies who cook things between single prompt to finish "eat" so to speak. wmuahahaha

leveling the playing field i see lol

thenaturalist · 16h ago

Ah, the beauty of price discovery.

Economists are having a field day.

naze · 17h ago

Cancelled my subscription.

cheema33 · 16h ago

It is unclear how Anthropic will survive now that you have canceled. Have you moved to a better model that is even cheaper? If so, please do share.

acedTrex · 17h ago

oh no, anyway!

octernion · 15h ago

wow, a team that has one nine of availability and trending downwards fast is relieving pressure. big surprise!

cdelsolar · 16h ago

the party is over

dionian · 17h ago

afraid im in the 5%. not doing anything nefarious, just lots of parallel usage, no scripting or overnight or anything.

i just found ccusage, which is very helpful. i wish i could get it straight from the source, i dont know if i can trust it... according to this ive spent more my 200$ monthly subscription basically daily in token value.. 30x supposed cost

ive been trying to learn how to make ccode use opus for planning and sonnet for execution automatically, if anyone has a good example of this please share

blalezarian · 18h ago

Can we PLEASE fix the bug in VS Code where the terminal occasionally scrolls out of control and VS Code crashes? It is very painful and we have to start the context all over again. This happens at least 1x per day.

steveklabnik · 18h ago

> we have to start the context all over again.

Use /resume

submeta · 16h ago

Tangential: Is there a similar service we can use in the cli, a replacement for CC? I like Cursor, I pay both for Cursor and CC, but. I live in the terminal (tmux, nvim, claude code, lazygit, yazi), and I prefer to have an agentic coding experience in the terminal. But CC has deteriorated so much in the past weeks that I constantly use repomix to compress whole projects and ask o3 for help because CC just can’t solve tasks that it previously would solve in a single shot.

lavezzi · 14h ago

use a proxy and use Claude Code with other models.

lvl155 · 16h ago

There are many but none are as polished as CC.

dboreham · 18h ago

Whenever a marketing person uses the word "unlimited", they mean: "limited".

usernamed7 · 18h ago

I'm sure it's way more than 5%, ISP's pulled the same thing with bandwidth caps to shame people.

Part of the reason there is so much usage is because using claude code is like slot machine, where SOMETIMES it's right, most times it needs to rework what it did, which is convenient for them. Plus their pricing is anything but transparent as for how much usage you actually get.

I'll just go back to ChatGPT. This is not worth the headache.

closewith · 19h ago

It was always too good to last.

I assume this is the end of the viability of the fixed price options.

steveklabnik · 19h ago

> affecting less than 5% of users based on current usage patterns.

serial_dev · 18h ago

Affecting 5% of users to me sounds like it affects pretty much everyone who got some value out of the tool and is not a casual coder.

steveklabnik · 18h ago

I don't think it's that simple. We'll see though, I am very curious if I'm in that 5% or not.

thoughtfulappco · 18h ago

I dont know if running an agent 24/7 means you are "better" than casual coder. Code quality is still a thing, HITL i believe in still (not buying complete agent/mcp hype), I don't know if using more resources = extracting more value atm

serial_dev · 18h ago

Emphasis on “and”.

> it affects pretty much everyone who got some value out of the tool AND is not a casual coder.

I didn’t mean casual in the negative sense, so there is no “better”, there is only casual and not casual.

My theory is that 5% sounds like a small number until you realize that many people just tried it and and didn’t like it, forgot to cancel, have it paid by their employer wishing for 100x improvements, or most people have found AI useful only in certain scenarios that they face every once in a while etc.

andix · 18h ago

I don't really get how those 24/7 agents should work. Claude code is the first tool that gives me really good LLM code, but it needs A LOT of guidance. I even interrupt it often, to stop it following through on bad ideas.

closewith · 18h ago

3 hours of Opus usage per work day will be exhausted fast.

i_am_proteus · 18h ago

We do not know if that is less than 5% of users or less than 5% of paid users?

We do know that PR teams enjoy framing things in the most favorable light possible.

steveklabnik · 18h ago

We do, they're paid users. These limits are regarding the usage of the paid plans.

SatvikBeri · 18h ago

It's talking about subscriptions, which implies paid users.

andix · 19h ago

Let's hope that's true, and not some statistics trick.

steveklabnik · 18h ago

I certainly am!

jstummbillig · 19h ago

This seems to be the exact opposite?

serial_dev · 18h ago

The writing is on the wall, we just need to read it. Fixed price offerings will be either gone or neutered.

jstummbillig · 18h ago

Why? There are a lot of products where fixed price offerings can exist, like internet access in many parts of the world. The way that Anthropic implemented this change and how they explain it, hints that this could work very much the same: There is be a level of borderline abusers that need to be reigned in. The rest are a few power users that are subsidized by a lot of normal users.

I am not saying this is what must happen here, but I see no effort to substantiate why it could not.

naze · 17h ago

I cancelled my subscription. The enshittification is hitting this space massively already.

malthaus · 19h ago

im really tired of all those ai players just winging it

can someone please find a conservative, sustainable business model and stick with it for a few months please instead of this mvp moving target bs

motoxpro · 19h ago

You can use the API. You're using the fixed price becasue it's cheaper. It's cheaper because they price it assuming a certain amount of usage being below a threshold. The usage went above the threshold. The usage was limited because of that, and they will make a plan with an increased fixed price and/or increase the price of the pro plan.

Seems pretty standard to me.

cheema33 · 16h ago

Agree with the other reply here. You are complaining about a problem that does not exist. Want price predictability? Use the API pricing! It is pay per use.

The Buffet-style pricing gets you more bang for the buck. How much more? That bit is uncertain. Adjust your expectations accordingly.

adtac · 18h ago

if you feel cheated now, I promise you'll feel more cheated if they did this after you rely on it for several months. the least worst option is to change pricing as early as possible.

beiconic · 19h ago

I saw this one coming. Going to make more and more people switch over to gemini.

LLM Inevitabilism (tomrenner.com)

Do not download the app, use the website (idiallo.com)

Kiro: A new agentic IDE (kiro.dev)

CARA – High precision robot dog using rope (aaedmusa.com)

Show HN: Tinder but it's only pictures of my wife and I can only swipe right (trytender.app)

Linux Reaches 5% Desktop Market Share in USA (ostechnix.com)

EU age verification app to ban any Android system not licensed by Google (reddit.com)

Dumb Pipe (dumbpipe.dev)

Valve confirms credit card companies pressured it to delist certain adult games (pcgamer.com)

Performance and telemetry analysis of Trae IDE, ByteDance's VSCode fork (github.com)

Enough AI copilots, we need AI HUDs (geoffreylitt.com)

Copyparty – Turn almost any device into a file server (github.com)

Hyatt Hotels are using algorithmic Rest “smoking detectors” (twitter.com)

‘I witnessed war crimes’ in Gaza – former worker at GHF aid site [video] (bbc.com)

How to Firefox (kau.sh)

Global hack on Microsoft Sharepoint hits U.S., state agencies, researchers say (washingtonpost.com)

Show HN: Use Their ID – Use your local UK MP’s ID for the Online Safety Act (use-their-id.com)

Reflections on OpenAI (calv.info)

Qwen3-Coder: Agentic coding in the world (qwenlm.github.io)

AI overviews cause massive drop in search clicks (arstechnica.com)

It's time for modern CSS to kill the SPA (jonoalderson.com)

Graphene OS: a security-enhanced Android build (lwn.net)

Ukrainian hackers destroyed the IT infrastructure of Russian drone manufacturer (prm.ua)

ChatGPT agent: bridging research and action (openai.com)

Windsurf employee #2: I was given a payout of only 1% what my shares where worth (twitter.com)

Mistral Releases Deep Research, Voice, Projects in Le Chat (mistral.ai)

VPN use surges in UK as new online safety rules kick in (ft.com)

Cops say criminals use a Google Pixel with GrapheneOS – I say that's freedom (androidauthority.com)

Tom Lehrer has died (nytimes.com)

The United States withdraws from UNESCO (state.gov)

XMLUI (blog.jonudell.net)

TrackWeight: Turn your MacBook's trackpad into a digital weighing scale (github.com)

Steam, Itch.io are pulling ‘porn’ games. Critics say it's a slippery slope (wired.com)

My Self-Hosting Setup (codecaptured.com)

Show HN: Shoggoth Mini – A soft tentacle robot powered by GPT-4o and RL (matthieulc.com)

Ozzy Osbourne has died (bbc.co.uk)

Coding with LLMs in the summer of 2025 – an update (antirez.com)

Rust running on every GPU (rust-gpu.github.io)

Oakland cops gave ICE license plate data; SFPD also illegally shared with feds (sfstandard.com)

Cloudflare 1.1.1.1 Incident on July 14, 2025 (blog.cloudflare.com)

Death by AI (davebarry.substack.com)

You can now disable all AI features in Zed (zed.dev)