This is why it’s so critical to have open source models.
In a year or so, the open source models will become good enough (in both quality and speed) to run locally.
Arguably, OpenAI OSS 120B is already good enough, in both quality and speed, to run on Mac Studio.
Then $10k, amortized over 3 years, will be enough to run code LLMs 24/7.
I hope that’s the future.
habosa · 42m ago
Every business building on LLMs should also have a contingency plan for if they needed to go to an all open-weights model strategy. OpenAI / Anthropic / Google have nothing stopping them from 100x-ing the price or limiting access or dropping old models or outright competing with their customers. Building your whole business on top of them will prove to be as foolish as all of the media companies that built on top of Facebook and got crushed later.
OfficialTurkey · 29m ago
Couldn't you also make this argument about cloud infrastructure from the standard hyperscaler cloud providers (AWS, GCP, ...)? For that matter, couldn't you make this argument about dependency your business has which it purchases from other businesses which are competing against each other to provide it?
empiko · 12m ago
In general, you are right, but AI as a field is pretty volatile still. Token producers are still pivoting and are generally losing money. They will have to change their strategy sooner or later, and there is a good chance that the users will not be happy about it.
skybrian · 2h ago
Open source models could be run by low-cost cloud providers, too. They could offer discounts for a long term contract and run it on dedicated hardware.
qingcharles · 2h ago
This. Your local LLM, even if shared between a pool of devs, is probably only going to be working 8 hours a day. Better to use a cloud provider, especially if you can find a way to ensure data security, if that is an issue for you.
wongarsu · 1h ago
Exactly. There is no shortage of providers hosting open source models with per-token pricing, with a variety of speeds and context sizes at different price points. Competition is strong and barriers of entry low, ensuring that margins stay low and prices fair.
If you want complete control over your data and don't trust anyone's assurances that they keep it private (and why should you) then you have to self-host. But if all you care about is a good price then the free market already provides that for open models
hkt · 38m ago
Hetzner and Scaleway already do instances with GPUs so this kinda already exists
It might be fun to work out how to share, too. A whole new breed of shell hosting.
6thbit · 1h ago
Many of the larger enterprises (retail, manufacture, insurance, etc) are consistently becoming cloud-only or have reduced their data center foot print massively over the last 10 years.
Do you think these enterprises will begin hosting their own models? I'm not convinced they'll join the capex race to build AI data centers. It would make more sense they just end up consuming existing services.
Then there are the smaller startups that just never had their own data center. Are those going to start self-hosting AI models? And all of the related requirements to allow say a few hundred employees to access a local service at once? network, HA, upgrades, etc. Say you have multiple offices in different countries also, and so on.
nunez · 1h ago
> Do you think these enterprises will begin hosting their own models? I'm not convinced they'll join the capex race to build AI data centers. It would make more sense they just end up consuming existing services.
they already are
physicsguy · 1h ago
> manufacture
They're much less strict than they were on cloud, but the security practices are really quite strict. I work in this sector and yes, they'll allow cloud, but strong data isolation + segregation, access controls, networking reqs, etc. etc. etc. are very much a thing in the industry still, particularly where the production process is commercially sensitive in itself.
g42gregory · 1h ago
Enterprises (depending on the sector, think semi manufacturing) will have no choice for two reasons:
1. Protecting their intellectual property, and
2. Unknown “safety” constraints baked in. Imagine an engineer unable to ran some security tests because LLM thinks it’s “unsafe”. Meanwhile, VP of Sales is on the line with the customer.
asadm · 2h ago
Even if they do get better. The latest closed-source {gemini|anthropic|openai} model will always be insanely good and it would be dumb to use a local one from 3 years back.
Also tooling, you can use aider which is ok. But claude code and gemini cli will always be superior and will only work correctly with their respective models.
asgraham · 1h ago
I don’t know about your first point: at some point the three-year difference may not be worth the premium, as local models reach “good enough.”
But the second point seems even less likely to be true: why will Claude code and Gemini cli always be superior? Other than advantageous token prices (which the people willing to pay the aforementioned premium shouldn’t even care about), what do they inherently have over third-party tooling?
nickstinemates · 1h ago
Even using Claude Code vs. something like Crush yields drastically different results. Same model, same prompt, same cost... the agent is a huge differentiator, which surprised me.
asgraham · 1h ago
I totally agree that the agent is essential, and that right now Claude Code is semi-unanimously the best agent. But agentic tooling is written, not trained (as far as I can tell—someone correct me) so it’s not immediately obvious to me that a third-party couldn’t eventually do it better.
Maybe to answer my own question, LLM developers have one, potentially two advantages over third-party tooling developers:
1) virtually unlimited tokens, zero rate limiting with which to play around with tooling dev.
2) the opportunity to train the network on their own tooling.
The first advantage is theoretically mitigated by insane VC funding, but will probably always be a problem for OSS.
I’m probably overlooking news that the second advantage is where Anthropic is winning right now; I don’t have intuition for where this advantage will change with time.
SparkyMcUnicorn · 2h ago
I use Claude Code with other models sometimes.
For well defined tasks that Claude creates, I'll pass off execution to a locally run model (running in another Claude Code instance) and it works just fine. Not for every task, but more than you might think.
hoppp · 2h ago
I am looking forward for the AMD 395 max+ PCs to come down in price.
The inference speed locally will be acceptable in 5-10 years thanks to those generation of chips and finally we can have good local AI apps.
okdood64 · 2h ago
What's performance of running OpenAI OSS 120B on a Mac Studio as compared to running a paid subscription frontier LLM?
jermaustin1 · 2h ago
I will answer for the 20B version on my RTX3090 for anyone who is interested (SUPER happy with the quality it outputs, as well). I've had it write a handful of HTML/CSS/JS SPAs already.
With medium and high reasoning, I will see between 60 and 120 tokens per second, which is outrageous compared to the LLaMa models I was running before (20-40tps - I'm sure I could have adjusted parameters somewhere in there).
ivape · 2h ago
Do we know why it’s so fast barring hardware?
mattmanser · 2h ago
Because he's getting crap output. Open source locally on something that under-powered is vastly worse than paid LLMs.
I'm no shill, I'm fairly skeptical about AI, but been doing a lot of research and playing to see what I'm missing.
I haven't bothered running anything locally as the overwhelming consensus is that it's just not good enough yet. And that from posts and videos in the last two weeks.
I've not seen something so positive about local LLMs anywhere else.
It's simply just not there yet, and definitely aren't for a 4090.
jermaustin1 · 47m ago
That is a bit harsh. I'm actually quite pleased with the code it is outputting currently.
I'm not saying it is anywhere close to a paid foundation model, but the code it is outputting (albeit simple) has been generally well written and works. I do only get a handful of those high-thought responses before the 50k token window starts to delete stuff, though.
ivape · 1h ago
I guess I meant how is a 20b param model simply faster than another 20b model? What techniques are they using?
medvezhenok · 1h ago
It's a MoE (mixture of experts) architecture, which means that there's only 3.6 billion parameters activated per token (but a total of 20b parameters for the model). So it should run at the same speed that a 3.6b model would run assuming that all of the parameters fit in vRAM.
Generally, 20b MoE will run faster but be less smart than a 20b dense model. In terms of "intelligence" the rule of thumb is the geometric mean between the number of active parameters and the number of total parameters.
So a 20b model with 3.6b active (like the small gpt-oss) should be roughly comparable in terms of output quality to a sqrt(3.6*20) = 8.5b parameter model, but run with the speed of a 3.6b model.
No comments yet
andrewmcwatters · 2h ago
Chiming in here, M1 Max MacBook Pro 64GB using gpt-oss:20b over ollama with Visual Studio Code with GitHub Copilot is unusably slow compared to using Claude Sonnet 4, which requires (I think?) GitHub Copilot Pro.
But I'm happy to pay the subscription vs buying a Mac Studio for now.
Jimpulse · 1h ago
Ollama's implementation for gpt-oss is poor.
root_axis · 2h ago
> In a year or so, the open source models will become good enough (in both quality and speed) to run locally.
"Good enough" for what is the question. You can already run them locally, the problem is that they aren't really practical for the use-cases we see with SOTA models, which are just now becoming passable as semi-reliable autonomous agents. There is no hope of running anything like today's SOTA models locally in the next decade.
cyanydeez · 2h ago
they might be passable, but there's zero chance they're economical atm.
No comments yet
coldtea · 56m ago
>In a year or so, the open source models will become good enough (in both quality and speed) to run locally.
Based on what?
And where? On systems < 48GB?
moritzwarhier · 1h ago
After trying gpt-oss:20b, I'm starting to lose faith in this argument, but I share your hope.
Also, I've never tried really huge local models and especially not RAG with local models.
jvanderbot · 1h ago
It's not hard to imagine a future where I license their network for inference on my own machine, and they can focus on training.
holoduke · 2h ago
Problem is that it really eats all resources when using a llm locally. I tried it. But the whole system becomes unresponsive and slow. We need minimum of 1tb memory and dedicated processors to offload.
cyanydeez · 2h ago
Its not, capitalism isn't about efficiency; it's about lockin. You can't lockin open source models. If fascism under republicans continue, you can bet they'll be shut down due to child safety or whatever excuse the large corporations need to turn off the free efficiency.
aydyn · 2h ago
This is unrealistic hopium, and deep down you probably know it.
There's no such thing as models that are "good enough". There are models that are better and models that are worse and OS models will always be worse. Businesses that use better, more expensive models will be more successful.
ch4s3 · 2h ago
> Businesses that use better, more expensive models will be more successful.
Better back of house tech can differentiate you, but startups history is littered with failed companies using the best tech, and they were often beaten by companies using a worse is better approach. Anyone here who has been around long enough has seen this play out a number of times.
freedomben · 28m ago
> startups history is littered with failed companies using the best tech, and they were often beaten by companies using a worse is better approach.
Indeed. In my idealistic youth I bought heavily into the "if you build it, they will come," but that turned out to not at all be reality. Often times the best product loses because of marketing, network effects, or some other reason that has nothing to do with the tech. I wish it weren't that way, but if wishes were fishes we'd all have a fry
seabrookmx · 2h ago
Most tech hits a point of diminishing returns.
I don't think we're there yet, but it's reasonable to expect at _some point_ your typical OS model could be 98% of the way to a cutting edge commercial model, and at that point your last sentence probably doesn't hold true.
Fade_Dance · 2h ago
There is a sweet spot, and at 100k per dev per year some businesses may choose lower priced options.
The business itself will also massively develop in the coming years. For example, there will be dozens of providers for integrating open source models with an in-house AI framework that smoothly works with their stack and deployment solution.
hsuduebc2 · 2h ago
I agree. It isn't in the interest of any actor including openai to give out their tools for free.
No comments yet
mockingloris · 2h ago
Most devs where I'm from would scrape to cough up that amount
More niche use case models have to be developed for cheaper and energy optimized hardware.
└── Dey well
skybrian · 2h ago
This would be a business expense. Compared to hiring a developer for a year, it would be more reasonable.
For a short-term gig, though, I don’t think they would do that.
crestfallen33 · 2h ago
I'm not sure where the author gets the $100k number, but I agree that Cursor and Claude Code have obfuscated the true cost of intelligence. Tools like Cline and its forks (Roo Code, Kilo Code) have shown what unmitigated inference can actually deliver.
The irony is that Kilo itself is playing the same game they're criticizing. They're burning cash on free credits (with expiry dates) and paid marketing to grab market share -- essentially subsidizing inference just like Cursor, just with VC money instead of subscription revenue.
The author is right that the "$20 → $200" subscription model is broken. But Kilo's approach of giving away $100+ in credits isn't sustainable either. Eventually, everyone has to face the same reality: frontier model inference is expensive, and someone has to pay for it.
patothon · 9m ago
that's a good point, however maybe the difference is that kilo is not creating a situation for themselves where they either have to reprice or they have to throttle.
I believe it's pretty clear when you use these credits that it's temporary (and that it's a marketing strategy), vs claude/cursor where they have to fit their costs into the subscription price and make things opaque to you
fragmede · 2h ago
Also frontier model training is expensive, and at some point, eventually, that bill also needs to get paid, by amortizing over inference pricing.
fercircularbuf · 55m ago
It sounds like Uber
cyanydeez · 2h ago
oh go one more step: the reality is these models are more expensive than hiring an intern to do the same thing.
Unless you got a trove of self starters with a lot of money, they arn't cost efficient.
jeanlucas · 2h ago
So convenient a future AI dev will cost as much as a human developer, pure coincidence
magicalhippo · 2h ago
Similar to housing in attractive places no? Price is related to what people can afford, rather than what the actual house/unit is worth in terms of material and labor.
maratc · 2h ago
Except for "material and labor" there is an additional cost of land.
That is already "related to what people can afford", in attractive places or not.
thisisit · 2h ago
This is just a ball park number. Its like the AI dev will cost some what less than a human developer. Enough for AI providers to have huge margins and allow for CTOs to say - "I replaced all devs and saved so much money".
mattmanser · 1h ago
And then the CTOs will learn the truth that most product managers are just glorified admin assistants who couldn't write a spec for tic-tac-toe.
And that to write the business analysis that the AI can actually turn into working code requires senior developers.
jgalt212 · 2h ago
It's sort of like how high cost funds net of fees offer the same returns as low cost ETFs net of fees.
oblio · 1h ago
I'm not sure I understand this one.
insane_dreamer · 1h ago
The full cost of an employee is a fair bit more than just their base salary.
SoftTalker · 1h ago
Wait until the taxes on AI come, to pay for all the unemployment they are creating.
eli_gottlieb · 2h ago
I mean, hey, rather than use AI at work, I'll just take the extra $100k/year and be just that good.
naiv · 2h ago
But he works 24/7 at then maybe 20x output
nicce · 2h ago
Why we can’t keep the current jobs but accelerate humanity development by more than 20x with AI? Everyone is just talking about replacement, without the mention of potential.
dsign · 1h ago
There is great potential. But if humanity can't share a loaf of bread with the needy, nor stop the blood irrigation of the cracked, dusty soil of cursed Canaan[^1], what are the odds that that acceleration will benefit anybody?
([^1]: They have been at it for a long while now, a few thousand years?)
hx8 · 2h ago
I don't think there is market demand for 20x more software produced each year. I suspect AI will actively _decrease_ demand for several major sectors of software development, as LLMs take over roles that were handled previously be independent applications.
nicce · 1h ago
I think it depends on how you view it.
With 20x productivity you can start to minimize your supply chain and reduce costs in the long term. No more cloud usage in foreign countries, since you might be able to make the necessary software by yourself.
You can start dropping expensive SaaS and you make enough for your own internal needs. Heck, I would just increase the demand because there is so much potential. Consultants and third-party software houses will likely decrease. unless they are even more efficient.
LLMs act as interfaces to applications which you are capable to build yourself and run your own hardware, since you are much more capable.
taftster · 2h ago
Right. This is insightful. It's not so much about replacing developers, per se. It's about replacing applications that developers were previously employed to create/maintain.
We talk about AI replacing a workforce, but your observation that it's more about replacing applications is spot on. That's definitely going to be the trend, especially for traditional back-office processing.
hx8 · 2h ago
I'm specifically commenting on the double whammy of increased software developer productivity and decreased demand for independent applications.
I'm not entirely sure I understand exactly what you're suggesting. But I'd imagine it's because a company that doesn't have to pay people will out compete the company that does.
There could be some scenario where it is advantageous to have humans working with AI. But if that isn't how reality plays out then companies won't be able to afford to pay people.
SpaceNoodled · 1h ago
An LLM by itself has 0% output.
An engineer shackled to an LLM has about 80% output.
croes · 2h ago
And is neither reliable nor liable.
crinkly · 2h ago
Like fuck that's happening. Human dev will spend entire day gaslighting an electronic moron rather than an an outsourced team.
The only argument we have so far is wild extrapolation and faith. The burden of proof is on the proclaimer.
IshKebab · 2h ago
> Both effects together will push costs at the top level to $100k a year. Spending that magnitude of money on software is not without precedent, chip design licenses from Cadence or Synopsys are already $250k a year.
For how many developers? Chip design companies aren't paying Synopsys $250k/year per developer. Even when using formal tools which are ludicrously expensive, developers can share licenses.
In any case, the reason chip design companies pay EDA vendors these enormous sums is because there isn't really an alternative. Verilator exists, but ... there's a reason commercial EDA vendors can basically ignore it.
That isn't true for AI. Why on earth would you pay more than a full time developer salary on AI tokens when you could just hire another person instead. I definitely think AI improves productivity but it's like 10-20% maybe, not 100%.
cornstalks · 2h ago
> For how many developers? Chip design companies aren't paying Synopsys $250k/year per developer. Even when using formal tools which are ludicrously expensive, developers can share licenses.
That actually probably is per developer. You might be able to reassign a seat to another developer, but that's still arguably one seat per user.
IshKebab · 1h ago
I don't think so. The company I worked for until recently had around 200 licenses for our main simulator - at that rate it would cost $50m/year, but our total run rate (including all salaries and EDA licenses) was only about $15m/year.
They're super opaque about pricing but I don't think it's that expensive. Apparently formal tools are way more expensive than simulation though (which makes sense), so we only had a handful of those licenses.
I managed to find a real price that someone posted:
That sounds way more realistic, and I guess you get decent volume discounts if you want 200 licenses.
ankit219 · 23m ago
No justification for a $100k number. For $100k a year or about $8k a month, you will end up using 1B tokens a month (that too a generous blended $8 per million input/output tokens including caching while the number is lower than that). Per person.
I think there is a case Claude did not reduce their pricing given that they have the best coding models out there. There recent fundraise had them disclose their Gross margins at 60% (and -30% with usage via bedrock etc). This way they can offer 2.5x more tokens at the same price than the vibe code companies and yet break even. The market movement where the assumption did not work out was about how we still only have claude which made vibe coding work and is the most tasteful when it comes to what users want. There are probably models better at thinking and logic, especially o3, but this signals the staying power of claude - having a lock in, it's popularity, and challenges the more fundamental assumption about language models being commodities.
(Speculating) Many companies woudl want to move away from claude but cant because users love the models.
jjcm · 2h ago
At some point the value of remote inference becomes more expensive than just buying the hardware locally, even for server-grade components. A GB200 is ~$60-70k and will run for multiple years. If inference costs continue to scale, at some point it just makes more sense to run even the largest models locally.
OSS models are only ~1 year behind SOTA proprietary, and we're already approaching a point where models are "good enough" for most usage. Where we're seeing advancements is more in tool calling, agentic frameworks, and thinking loops, all of which are independent of the base model. It's very likely that local, continuous thinking on an OSS model is the future.
tempest_ · 2h ago
Maybe 60-70k nominally but where can you get one that isnt in its entire rack configuration
jjcm · 53m ago
Fair, but even if you budget an additional $30k for a self-contained small-unit order, you've brought yourself to the equivalent proposed spend of 1 year of inference.
At $100k/yr/eng inference spend, your options widen greatly is my point.
boltzmann_ · 2h ago
Author just choose a nice number and give no argument to it
mromanuk · 2h ago
Probably chose $100k/yr as an example of the salary of a developer.
whateveracct · 2h ago
This is the goal. Create a reason to shave a bunch off the top of SWE salaries. Pay them less because you "have" to pay for AI tools. All so they don't have to do easy rote work - you still get them to do the high level stuff humans must do.
sovietmudkipz · 2h ago
What is everyone’s favorite parallel agent stack?
I’ve just become comfortable using GH copilot in agent mode, but I haven’t started letting it work in an isolated way in parallel to me. Any advise on getting started?
typs · 2h ago
This makes sense as long as people continue to value using the best models (which may or may not continue for lots of reasons).
I’m not entirely sure that AI companies like Cursor necessarily miscalculated though. It’s noted that the actual strategies the blog advertises are things used by tools like Cursor (via auto mode). The important thing for them is that they are able to successfully push users towards their auto mode and use more usage data to improve their routing and frontier models don’t continue to be so much better AND so expensive that users continue to demand them. I wouldn’t hate that bet if I were Cursor personally.
thebigspacefuck · 1h ago
Never heard of kilo before, pretty sure this post is just an ad
lvl155 · 38m ago
I’ve not heard either but now I am getting ads from them. I guess that was their plan.
dcre · 1h ago
"The bet was that by the following year, the application inference would cost 90% less, creating a $160 gross profit (+80% gross margins). But this didn't happen, instead of declining the application inference costs actually grew!"
This doesn't make any sense to me. Why would Cursor et al expect they could pocket the difference if inference costs went down? There's no stickiness to the product; they would compete down to zero margins regardless. If anything, higher total spend is better for them because it's more to skim off of.
AstroBen · 1h ago
> charge users $200 while providing at least $400 worth of tokens, essentially operating at -100% gross margin.
Why are we assuming everyone uses the full $400? Margins aren't calculated based on only the heaviest users..
And where are they pulling the 100k number from?
zahlman · 2h ago
> The difference in pay between inference and training engineers is because of their relative impact. You train a model with a handful of people while it is used by millions of people.
Okay, but when did that ever create a comparable effect for any other kind of software dev in history?
mockingloris · 2h ago
@g42gregory This would mean that for the certain devs, an unfair advantage would be owning a decent on-prem rig running a fine tuned and trained model that has been optimized for specific use case for the user.
A fellow HN user's post I engaged with recently talked about low hanging fruits.
What that means for me and where I'm from is some sort of devloan initiative by NGOs and Government Grants, where devs have access to these models/hardware and repay back with some form of value.
What that is, I haven't thought that far. Thoughts?
└── Dey well
hx8 · 2h ago
How many parallel agents can one developer actively keep up with? Right now, my number seems to be about 3-5 tasks, if I review the output.
If we assume 5 tasks, each running $400/mo of tokens, we reach an annual bill of $24,000. We would have to see a 4x increase in token cost to reach the $100,000/yr mark. This seems possible with increased context sizes. Additionally, we might see additional context sizes lead to longer running more complicated tasks which would increase my number of parallel tasks.
6thbit · 1h ago
An interesting metric is when token bills per dev exceed the cost of hiring a new dev. But also, if paying another dev's worth in tokens getting you further than 2 devs without using AI will you still pay it?
I wonder how the economics will play out, especially when you add in all the different geographic locations for remote devs and their cost.
jjmarr · 1h ago
They already do for anything not in Western Europe/North America.
The $100k/dev/year figure feels like sticker shock math more than reality. Yes, AI bills are growing fast - but most teams I see are still spending substantially lower annually, and that's before applying even basic optimizations like prompt caching, model routing, or splitting work across models.
The real story is the AWS playbook all over again: vendors keep dropping unit costs, customers keep increasing consumption faster than prices fall, and in the end the bills still grow. If you’re not measuring it daily, the "marginal cost is trending down" narrative is meaningless - you’ll still get blindsided by scale.
I'm biased but the winners will be the ones who treat AI like any other cloud resource: ruthlessly measured, budgeted, and tuned.
oblio · 35m ago
Ironically, except for Graviton (and that's also plateauing; plus it requires that you're able to use it), basically no old AWS service has been reduced in cost since 2019. EC2, S3, etc.
nunez · 1h ago
Dude, thank you for this service. I use ec2instance.info and vantage.sh for Azure all of the time.
austin-cheney · 2h ago
There is nothing new here and the math on this is pretty simple. AI greatly increases automation, but its output is not trusted. All research so far shows AI assisted development is a zero sum game regarding time and productivity because time saved by AI is reinvested back into more thorough code reviews than were otherwise required.
Ultimately, this will become a people problem more than a financial problem. People that lack the confidence to code without AI will cost less to hire and dramatically more to employ, no differently than people reliant on large frameworks. All historical data indicates employers will happily eat that extra cost if it means candidates are easier to identify and select because hiring and firing remain among the most serious considerations for technology selection.
Candidates, currently thought of 10x, that are productive without these helpers will continue to remain no more or less elusive than they are now. That means employers must choose between higher risks with higher selection costs for the potentially higher return on investment knowing that ROE is only realized if these high performance candidates are allowed to execute with high productivity. Employers will gladly eat increased expenses if they can qualify lower risks to candidate selection.
jjmarr · 1h ago
You're assuming it's a binary between coding with or without AI.
In my experience, a 10x developer that can code without AI becomes a 100x developer because the menial tasks they'd delegate to less-skilled employees while setting technical direction can now be delegated to an AI instead.
If your only skill is writing boilerplate in a framework, you won't be employed to do that with AI. You will not have a job at all and the 100xer will take your salary.
austin-cheney · 33m ago
Those are some strange guesses.
oblio · 39m ago
The thing is, the 100x can't be in all the verticals, speak all the languages, be a warm body required by legislation, etc, etc. Plus that 100x just became a 10x (x 10x) bus factor.
This will reduce demand for devs but it's super likely that after a delay, demand for software development will go even higher.
The only thing I don't know is how that demand for software development will look like. It could be included in DevOps work or IT Project Management work or whatever.
I guess we'll see in a few years.
zeld4 · 2h ago
give me $50k raise and I need only $10k/yr.
seriously, I don't see the AI outcome worth that much yet.
On the current level of ai tools, the attention you need to manage 10+ async tasks are over limit for most human.
In 10 years maybe, but $100k probably worths much less by then.
daft_pink · 1h ago
If you are throttled at $200 per month, you should probably just pay another $200 a month for a second subscription, because the value is there. That’s my take from using Claude.
jvanderbot · 1h ago
It's not hard to imagine a future where I license their network for inference on my own machine, and they can focus on training.
oblio · 32m ago
The problem with this is that the temptation to do more us too big. Nobody wants to be a "dumb pipe", a utility.
What does "Dey well" and "Yarn me" mean at the bottom of your comments?
mockingloris · 2h ago
They are Nigerian Pidgin English words:
- Dey well: Be well
- Yarn me: Lets talk
└── Dey well/Be well
SoftTalker · 1h ago
Please don't use signature lines in HN comments.
Edit: Would have sworn that this was in the guidelines but I don't see it just now.
nmeofthestate · 1h ago
Ok, don't do that.
lvl155 · 39m ago
What is Kilocode?
mwkaufma · 1h ago
Title modded without merit.
masterj · 2h ago
Why even stop at 100k/yr? Surely the graph is up-and-to-the-right forever? https://xkcd.com/605/
chiffre01 · 2h ago
Honestly we're in a race to the bottom right now with AI.
It's only going to get cheaper to train and run these models as time goes on. Modes running on single consumer grade PCs today were almost unthinkable four years ago.
gedy · 2h ago
Maybe this is why companies are hyping the "replacing devs" angle, as "wow see we're still cheaper than that engineer!" is going to be only viable pitch.
woeirua · 1h ago
Its not viable yet, and at current token spend rates, it's likely not going to be viable for several years.
turnsout · 2h ago
Tools like Cursor rely on the gym model—plenty of people will pay for a tier that they don't fully utilize. The heavy users are subsidized by the majority who may go months without using the tool.
AtNightWeCode · 2h ago
Don't know about the numbers but is this not the cloud all over again. Promises about cheap storage and you don't maintain it developed into maintenance hell and storage costs steadily rising instead of dropping.
yieldcrv · 2h ago
I think what this model actually showed is a cyclical aspect of tokens as a commodity
It is based on supply and demand of GPUs, the demand currently outstrips supply, while the 'frontier models' are also much more computationally efficient than last year's models in some ways - using far fewer computational resources to do the same thing
so now that everyone wants to use frontier models in "agentic mode" with reasoning eating up a ton more tokens before sticking with a result, the demand is outpacing supply but it is possible it equalizes yet again, before the cycle begins anew
throwanem · 2h ago
"Tokenomics."
TranquilMarmot · 2h ago
I studied this in college but I think we had a different idea of what "toke" means
throwanem · 1h ago
Eh. The implicit claim is the same as everywhere, namely that that $100k/dev/year of AI opex is an enormous bargain over going up two orders of magnitude in capex to pay for the same output from a year's worth of a team. But now that Section 174's back and clearly set to stay for a good long while, it makes sense to see this line of discourse come along.
senko · 2h ago
tl;dr
> This is driven by two developments: more parallel agents and more work done before human feedback is needed.
In a year or so, the open source models will become good enough (in both quality and speed) to run locally.
Arguably, OpenAI OSS 120B is already good enough, in both quality and speed, to run on Mac Studio.
Then $10k, amortized over 3 years, will be enough to run code LLMs 24/7.
I hope that’s the future.
If you want complete control over your data and don't trust anyone's assurances that they keep it private (and why should you) then you have to self-host. But if all you care about is a good price then the free market already provides that for open models
It might be fun to work out how to share, too. A whole new breed of shell hosting.
Do you think these enterprises will begin hosting their own models? I'm not convinced they'll join the capex race to build AI data centers. It would make more sense they just end up consuming existing services.
Then there are the smaller startups that just never had their own data center. Are those going to start self-hosting AI models? And all of the related requirements to allow say a few hundred employees to access a local service at once? network, HA, upgrades, etc. Say you have multiple offices in different countries also, and so on.
they already are
They're much less strict than they were on cloud, but the security practices are really quite strict. I work in this sector and yes, they'll allow cloud, but strong data isolation + segregation, access controls, networking reqs, etc. etc. etc. are very much a thing in the industry still, particularly where the production process is commercially sensitive in itself.
1. Protecting their intellectual property, and
2. Unknown “safety” constraints baked in. Imagine an engineer unable to ran some security tests because LLM thinks it’s “unsafe”. Meanwhile, VP of Sales is on the line with the customer.
Also tooling, you can use aider which is ok. But claude code and gemini cli will always be superior and will only work correctly with their respective models.
But the second point seems even less likely to be true: why will Claude code and Gemini cli always be superior? Other than advantageous token prices (which the people willing to pay the aforementioned premium shouldn’t even care about), what do they inherently have over third-party tooling?
Maybe to answer my own question, LLM developers have one, potentially two advantages over third-party tooling developers: 1) virtually unlimited tokens, zero rate limiting with which to play around with tooling dev. 2) the opportunity to train the network on their own tooling.
The first advantage is theoretically mitigated by insane VC funding, but will probably always be a problem for OSS.
I’m probably overlooking news that the second advantage is where Anthropic is winning right now; I don’t have intuition for where this advantage will change with time.
For well defined tasks that Claude creates, I'll pass off execution to a locally run model (running in another Claude Code instance) and it works just fine. Not for every task, but more than you might think.
The inference speed locally will be acceptable in 5-10 years thanks to those generation of chips and finally we can have good local AI apps.
With medium and high reasoning, I will see between 60 and 120 tokens per second, which is outrageous compared to the LLaMa models I was running before (20-40tps - I'm sure I could have adjusted parameters somewhere in there).
I'm no shill, I'm fairly skeptical about AI, but been doing a lot of research and playing to see what I'm missing.
I haven't bothered running anything locally as the overwhelming consensus is that it's just not good enough yet. And that from posts and videos in the last two weeks.
I've not seen something so positive about local LLMs anywhere else.
It's simply just not there yet, and definitely aren't for a 4090.
I'm not saying it is anywhere close to a paid foundation model, but the code it is outputting (albeit simple) has been generally well written and works. I do only get a handful of those high-thought responses before the 50k token window starts to delete stuff, though.
Generally, 20b MoE will run faster but be less smart than a 20b dense model. In terms of "intelligence" the rule of thumb is the geometric mean between the number of active parameters and the number of total parameters.
So a 20b model with 3.6b active (like the small gpt-oss) should be roughly comparable in terms of output quality to a sqrt(3.6*20) = 8.5b parameter model, but run with the speed of a 3.6b model.
No comments yet
But I'm happy to pay the subscription vs buying a Mac Studio for now.
"Good enough" for what is the question. You can already run them locally, the problem is that they aren't really practical for the use-cases we see with SOTA models, which are just now becoming passable as semi-reliable autonomous agents. There is no hope of running anything like today's SOTA models locally in the next decade.
No comments yet
Based on what?
And where? On systems < 48GB?
Also, I've never tried really huge local models and especially not RAG with local models.
There's no such thing as models that are "good enough". There are models that are better and models that are worse and OS models will always be worse. Businesses that use better, more expensive models will be more successful.
Better back of house tech can differentiate you, but startups history is littered with failed companies using the best tech, and they were often beaten by companies using a worse is better approach. Anyone here who has been around long enough has seen this play out a number of times.
Indeed. In my idealistic youth I bought heavily into the "if you build it, they will come," but that turned out to not at all be reality. Often times the best product loses because of marketing, network effects, or some other reason that has nothing to do with the tech. I wish it weren't that way, but if wishes were fishes we'd all have a fry
I don't think we're there yet, but it's reasonable to expect at _some point_ your typical OS model could be 98% of the way to a cutting edge commercial model, and at that point your last sentence probably doesn't hold true.
The business itself will also massively develop in the coming years. For example, there will be dozens of providers for integrating open source models with an in-house AI framework that smoothly works with their stack and deployment solution.
No comments yet
More niche use case models have to be developed for cheaper and energy optimized hardware.
└── Dey well
For a short-term gig, though, I don’t think they would do that.
The irony is that Kilo itself is playing the same game they're criticizing. They're burning cash on free credits (with expiry dates) and paid marketing to grab market share -- essentially subsidizing inference just like Cursor, just with VC money instead of subscription revenue.
The author is right that the "$20 → $200" subscription model is broken. But Kilo's approach of giving away $100+ in credits isn't sustainable either. Eventually, everyone has to face the same reality: frontier model inference is expensive, and someone has to pay for it.
I believe it's pretty clear when you use these credits that it's temporary (and that it's a marketing strategy), vs claude/cursor where they have to fit their costs into the subscription price and make things opaque to you
Unless you got a trove of self starters with a lot of money, they arn't cost efficient.
That is already "related to what people can afford", in attractive places or not.
And that to write the business analysis that the AI can actually turn into working code requires senior developers.
([^1]: They have been at it for a long while now, a few thousand years?)
LLMs act as interfaces to applications which you are capable to build yourself and run your own hardware, since you are much more capable.
We talk about AI replacing a workforce, but your observation that it's more about replacing applications is spot on. That's definitely going to be the trend, especially for traditional back-office processing.
https://en.m.wikipedia.org/wiki/Jevons_paradox
There could be some scenario where it is advantageous to have humans working with AI. But if that isn't how reality plays out then companies won't be able to afford to pay people.
An engineer shackled to an LLM has about 80% output.
The only argument we have so far is wild extrapolation and faith. The burden of proof is on the proclaimer.
For how many developers? Chip design companies aren't paying Synopsys $250k/year per developer. Even when using formal tools which are ludicrously expensive, developers can share licenses.
In any case, the reason chip design companies pay EDA vendors these enormous sums is because there isn't really an alternative. Verilator exists, but ... there's a reason commercial EDA vendors can basically ignore it.
That isn't true for AI. Why on earth would you pay more than a full time developer salary on AI tokens when you could just hire another person instead. I definitely think AI improves productivity but it's like 10-20% maybe, not 100%.
That actually probably is per developer. You might be able to reassign a seat to another developer, but that's still arguably one seat per user.
They're super opaque about pricing but I don't think it's that expensive. Apparently formal tools are way more expensive than simulation though (which makes sense), so we only had a handful of those licenses.
I managed to find a real price that someone posted:
https://www.reddit.com/r/FPGA/comments/c8z1x9/modelsim_and_q...
> Questa Prime licenses for ~$30000 USD.
That sounds way more realistic, and I guess you get decent volume discounts if you want 200 licenses.
I think there is a case Claude did not reduce their pricing given that they have the best coding models out there. There recent fundraise had them disclose their Gross margins at 60% (and -30% with usage via bedrock etc). This way they can offer 2.5x more tokens at the same price than the vibe code companies and yet break even. The market movement where the assumption did not work out was about how we still only have claude which made vibe coding work and is the most tasteful when it comes to what users want. There are probably models better at thinking and logic, especially o3, but this signals the staying power of claude - having a lock in, it's popularity, and challenges the more fundamental assumption about language models being commodities.
(Speculating) Many companies woudl want to move away from claude but cant because users love the models.
OSS models are only ~1 year behind SOTA proprietary, and we're already approaching a point where models are "good enough" for most usage. Where we're seeing advancements is more in tool calling, agentic frameworks, and thinking loops, all of which are independent of the base model. It's very likely that local, continuous thinking on an OSS model is the future.
At $100k/yr/eng inference spend, your options widen greatly is my point.
I’ve just become comfortable using GH copilot in agent mode, but I haven’t started letting it work in an isolated way in parallel to me. Any advise on getting started?
I’m not entirely sure that AI companies like Cursor necessarily miscalculated though. It’s noted that the actual strategies the blog advertises are things used by tools like Cursor (via auto mode). The important thing for them is that they are able to successfully push users towards their auto mode and use more usage data to improve their routing and frontier models don’t continue to be so much better AND so expensive that users continue to demand them. I wouldn’t hate that bet if I were Cursor personally.
This doesn't make any sense to me. Why would Cursor et al expect they could pocket the difference if inference costs went down? There's no stickiness to the product; they would compete down to zero margins regardless. If anything, higher total spend is better for them because it's more to skim off of.
Why are we assuming everyone uses the full $400? Margins aren't calculated based on only the heaviest users..
And where are they pulling the 100k number from?
Okay, but when did that ever create a comparable effect for any other kind of software dev in history?
A fellow HN user's post I engaged with recently talked about low hanging fruits.
What that means for me and where I'm from is some sort of devloan initiative by NGOs and Government Grants, where devs have access to these models/hardware and repay back with some form of value.
What that is, I haven't thought that far. Thoughts?
└── Dey well
If we assume 5 tasks, each running $400/mo of tokens, we reach an annual bill of $24,000. We would have to see a 4x increase in token cost to reach the $100,000/yr mark. This seems possible with increased context sizes. Additionally, we might see additional context sizes lead to longer running more complicated tasks which would increase my number of parallel tasks.
I wonder how the economics will play out, especially when you add in all the different geographic locations for remote devs and their cost.
The $100k/dev/year figure feels like sticker shock math more than reality. Yes, AI bills are growing fast - but most teams I see are still spending substantially lower annually, and that's before applying even basic optimizations like prompt caching, model routing, or splitting work across models.
The real story is the AWS playbook all over again: vendors keep dropping unit costs, customers keep increasing consumption faster than prices fall, and in the end the bills still grow. If you’re not measuring it daily, the "marginal cost is trending down" narrative is meaningless - you’ll still get blindsided by scale.
I'm biased but the winners will be the ones who treat AI like any other cloud resource: ruthlessly measured, budgeted, and tuned.
Ultimately, this will become a people problem more than a financial problem. People that lack the confidence to code without AI will cost less to hire and dramatically more to employ, no differently than people reliant on large frameworks. All historical data indicates employers will happily eat that extra cost if it means candidates are easier to identify and select because hiring and firing remain among the most serious considerations for technology selection.
Candidates, currently thought of 10x, that are productive without these helpers will continue to remain no more or less elusive than they are now. That means employers must choose between higher risks with higher selection costs for the potentially higher return on investment knowing that ROE is only realized if these high performance candidates are allowed to execute with high productivity. Employers will gladly eat increased expenses if they can qualify lower risks to candidate selection.
In my experience, a 10x developer that can code without AI becomes a 100x developer because the menial tasks they'd delegate to less-skilled employees while setting technical direction can now be delegated to an AI instead.
If your only skill is writing boilerplate in a framework, you won't be employed to do that with AI. You will not have a job at all and the 100xer will take your salary.
This will reduce demand for devs but it's super likely that after a delay, demand for software development will go even higher.
The only thing I don't know is how that demand for software development will look like. It could be included in DevOps work or IT Project Management work or whatever.
I guess we'll see in a few years.
seriously, I don't see the AI outcome worth that much yet.
On the current level of ai tools, the attention you need to manage 10+ async tasks are over limit for most human.
In 10 years maybe, but $100k probably worths much less by then.
└── Yarn me
Edit: Would have sworn that this was in the guidelines but I don't see it just now.
It's only going to get cheaper to train and run these models as time goes on. Modes running on single consumer grade PCs today were almost unthinkable four years ago.
It is based on supply and demand of GPUs, the demand currently outstrips supply, while the 'frontier models' are also much more computationally efficient than last year's models in some ways - using far fewer computational resources to do the same thing
so now that everyone wants to use frontier models in "agentic mode" with reasoning eating up a ton more tokens before sticking with a result, the demand is outpacing supply but it is possible it equalizes yet again, before the cycle begins anew
> This is driven by two developments: more parallel agents and more work done before human feedback is needed.