What's the point of this article besides free propaganda? It seems to me like every other AI shop except for OpenAI and possibly Anthropic only gets mentioned once they actually release something.
janalsncm · 1d ago
That was my thought, except the author even mentioned they couldn’t even get a comment from OpenAI for their “article”. Can’t beat free advertising.
butterlettuce · 1d ago
Probably because OpenAI is the best at this than any other player in the game. They’re on their way to replacing Google as the #1 search engine. Instead of “google it” it’s going to be “gpt it”, and we all know who “gpt” is.
kabes · 1d ago
Not sure. Since Google started to include a Gemini response on top of their search results I stopped using chatgpt for search
moralestapia · 1d ago
Funny, the AI summary makes the experience shittier for me.
The UI jumps and everything moves, I now have to wait until it loads. Massive UX mistake, you learn this the first week you make websites ...
jonas21 · 1d ago
Hmm... it doesn't jump for me. There's a fixed amount of space reserved at the top of the screen and the AI overview loads into that. It only expands if you press the "Show more" button.
j_timberlake · 21h ago
No one is ever going to say "gpt it", regardless of OpenAI's future.
malux85 · 23h ago
When I had some mild success with my first startup in the UK, I got flooded by news organisations offering "premium publication content" --- basically pay us a big chunk of cash to write a positive article on your company.
I was surprised at the number and calibre of orgs that came to me who would basically say anything I wanted for cash, this opened my eyes a lot and made me very suspicious of published media.
pmdr · 16h ago
I don't think they're even getting paid for this one.
alecco · 13h ago
One hand washes the other.
swyx · 1d ago
the media outlets respond to our clicks, and we click on gpt5 stories. monkey brain go brr.
cheeze · 1d ago
For OpenAI at least, it's obvious. They are considered the industry leader at this point and are the most widely used LLM that folks are aware of (arguably, Google's 'in search' summarization is the most widely used).
People get excited on an update in such a rapidly changing space. It really is that simple.
If it were any good I would assume there would be no need to hype it up.
My theory is that LLMs will get commoditized within the next year. The edge that OpenAI had over the competition is arguably lost. If the trend continues we will be looking at inference like commodity prices, where the most efficient like cerebras and groq will be the only ones actually making money at the end.
moojacob · 1d ago
They already are. I have been using Kimi k2. It is 90% as good as Sonnet and on Groq 3x faster and 1/5th the price.
pizzalife · 1d ago
What kind of GPU setup are you using for Kimi?
evandena · 1d ago
Sounds like he's using it on Groq, no self hosting.
moojacob · 1d ago
No, I don't self host. I use Groq.com as my provider.
OldfieldFund · 13h ago
isn't it Q4 quantized on groq?
csomar · 16h ago
That's an interesting name to choose. For second there, I thought GroK enabled third-party models.
Jackson__ · 16h ago
>If it were any good I would assume there would be no need to hype it up.
If it were good, there would also be no need for their devs to tell people to temper their expectations, alas... [0]
"My theory is that LLMs will get commoditized within the next year."
Incredibly bad theory, it's like you're saying every LLM is the same because they can all talk, even though the newer ones continue to smash through benchmarks the older ones couldn't. And now it happens quarterly instead of yearly, so you can't even say it's slowing down.
dimitri-vs · 23h ago
I don't think so, look at how Sora changed every... Well Operator was a game changer for.. Hmm, but what about gpt-4.5 or PhD level o3... o3-pro...? I mean, the 10k/mon agents are definitely coming... any day now...
Anyway, I'm sure gpt-5 will be AGI.
infecto · 1d ago
At the moment most of the dollars are coming from consumer, inclusive of business, subscriptions. That’s where the valuations are getting pegged and most API dollars are probably seen as experimental. The model quality matters but product experience is what is driving revenue. In that sense OpenAI is doing quite well.
janalsncm · 1d ago
If that is the case, the $300 billion question is whether someone can create a product experience that is as good as OpenAI’s.
In my mind there are really three dimensions they can differentiate on: cost, speed, and quality. Cost is hard because they’re already losing money. Speed is hard because differentiation would require better hardware (more capex).
For many tasks, perhaps even a majority right now, quality of free models is approaching good enough.
OpenAI could create models which are unambiguously more reliable than the competition, or ones which are able to answer questions no other model can. Neither of those has happened yet afaik.
epicureanideal · 1d ago
Competitors just need to wait for OpenAI to burn all their free money and dig themselves a debt hole they can’t easily climb out of, and then offer a similar experience at a price that barely breaks even or makes a tiny profit, and they win.
runako · 1d ago
> three dimensions they can differentiate on: cost, speed, and quality
The fourth dimension is likely to be the most powerful of the differentiators: specificity.
Think Cursor or Lovable, but tailored for other industries.
There's a weird thing where engineers tend to be highly paid, but people who employ engineers are hesitant to spend highly on tools to make their engineers more productive. Hence all Cursor's magic only gets its base price to ~50% of Intercom's entry-level fee for a tool for people who do customer support.
LLMs applied to high-value industries outside of tech are going to be a big differentiator. And the companies that build such solutions will not have the giant costs associated with building the next foundation model, or potentially operating any models at all.
jryle70 · 19h ago
Which year did Linux become the dominant desktop OS?
empath75 · 1d ago
> If it were any good I would assume there would be no need to hype it up.
Yes, this is why apple famously just dumped the original iphone on the market without telling anybody about it ahead of time.
SiempreViernes · 1d ago
With this comparison you are saying the original Iphone was like version 6 of an well established line of products in a market that had seen major releases a few times a year for about three years.
That's certainly not how the first iphone is usually described.
tootie · 1d ago
The fact that xAI only exists for Elon Musk's personal spite and they produced a top performing model certainly implies that model training isn't any kind of moat. It's certainly very expensive but not mysterious.
ml-anon · 1d ago
“Top performing model”
Ie overfit to benchmarks.
93po · 21h ago
source?
himeexcelanta · 1d ago
They can call it whatever they want…not sure that has a great deal of meaning unless there’s a GPT-4/Claude 3.5 level step change.
jug · 23h ago
I doubt it will have. OpenAI planned to release GPT-5 in 2024 or early 2025, it underwhelmed, and anonymous OpenAI sources have claimed that the later GPT-4.5 was actually GPT-5 relabelled to set expectations. It was seen as roughly a 20% improvement over GPT-4o. This is when it sunk in for OpenAI that they were at the end of the road for non-reasoning models. Scaling issues made them too costly.
Turning to their reasoning models, it’s also known and documented through SimpleQA and PersonQA that OpenAI o3 hallucinates more than o1, and o4-mini even more than o3. There’s an unmanaged issue where training on synthetic data improves benchmark results on STEM tasks but increases hallucination rates, especially troubling OpenAI models for some reason (my guess: they’re fine-tuned to take risks since it’s known to also increase likelihood of getting it right for hard tasks?)
Google has long known OpenAI struggles with hallucinations more than them according to an anonymous Googler that I saw commented on this. This has been verified by the aforementioned benchmarks. Anthropic also struggles less. But as far as I can tell, they’re all facing issues with synthetic data acting like a double edged sword.
So GPT-5 is going to be interesting. How well it exactly does will bear a lot of meaning for the kind of trouble OpenAI is in right now. Maybe OpenAI has found a novel approach in reducing hallucinations? I think that’s among their most crucial points right now. But other than this, no, I don’t expect a revolution, only an evolution. They might currently win benchmarks, but it will hardly be something that catapults them.
If GPT-5 underwhelms, it will bear a stronger signal than merely the one that GPT-5 underwhelms. Because then OpenAI has trouble with both non-reasoning and reasoning models, and we’re likely to be looking at the end of the road on the horizon for current GPT based LLM’s and one where the winner will probably ultimately be cheaper open weight models once they catch up.
j_timberlake · 21h ago
This is the smart take. The version numbers were useful going from GPT-3 to GPT-3.5 and then GPT-4, but after that OpenAI butchered the usefulness of the naming scheme, with stuff like o4 and 4-o. Version numbers now tell you almost nothing about the change.
This happened because training progressively larger models used to be the main path forward, which was easy to track and name, but currently it's all about quickly incorporating synthetic data chain-of-thoughts created by flash models.
jobs_throwaway · 1d ago
Meh. The models are already quite useful. If the improvement is half of what the jump to GPT-4 was, it will be a big deal.
himeexcelanta · 1d ago
Agree on the usefulness of models. They still require a lot of babysitting for software development. We’re seeing marginal improvements, but the aggregate utility still adds up over time. Am just skeptical of OpenAI at this point for most things.
OutOfHere · 1d ago
OpenAI promises too much and delivers too little. It promised "Agent mode" and "Study and learn", but I have neither despite paying for the service.
I get the impression that OpenAI will rename what's intended as o4 to gpt-5 and package it as such.
joshstrange · 1d ago
I will die on this hill.
Nothing annoys me more than OpenAI acting like something is rolling out (or rolled out already) and then taking forever to do so.
> ChatGPT agent starts rolling out today to Pro, Plus, and Team; Pro will get access by the end of day, while Plus and Team users will get access over the next few days.
"Next few days" - It's 8 days later (so far). Lest one think "It's only 8 days, geez, calm down": They do this _all the time_. I don't even remember the length of the gap between announcing the enhanced voices and then forgetting about it completely before it finally rolled out.
It sours every announcement they make in my opinion.
j_timberlake · 21h ago
I feel like Altman is taking Elon's path. ChatGPT was his Tesla Model 3. Now he's in the overpromise/underdeliver phase, he's probably going to get himself into a "funding secured" moment soon enough, but he definitely hasn't started calling people "pedo" yet.
gneuron · 13h ago
Arguably he already did with the $40B round from Masayoshi Son, who essentially committed only $10B and the other $30B was conditional on OpenAI converting from a non profit by the end of the year, but now that possibility is in question because OpenAI and Microsoft can't agree on a deal (and Microsoft controls whether or not OpenAi converts to a for profit, and when).
kifler · 1d ago
For what it's worth, I had Agent Mode deployed to my account earlier this morning.
MaxfordAndSons · 1d ago
> Altman decided to let GPT-5 take a stab at a question he didn’t understand. “I put it in the model, this is GPT-5, and it answered it perfectly,” Altman said.
If he didn't understand the question how could he know the model answered it perfectly?
janalsncm · 1d ago
Pay close attention to these demos. Often the AI is ok but not amazing, but because it’s shaped like the right thing they don’t look any deeper.
It makes selling improvements fairly hard actually. If the last model already wrote an amazing poem about hot dogs, the English language doesn’t have superlatives to handle what the next model creates.
foolfoolz · 1d ago
usually less perfect is a better sign of integrity
amohn9 · 1d ago
This statement is definitely just marketing hype, but if we're being pedantic there are tons of questions that are hard to answer but have easy to verify solutions, e.g. all NP-complete problems.
SiempreViernes · 1d ago
No, that is being generous towards marketing hype.
If we are being pedantic we could never accept "question we don't know how to answer" as a possible interpretation of "question we don't understand".
asadotzler · 1d ago
There's also nothing here that wasn't true of GPT4 so why is bragging about it for GPT5 something notable?
CommieBobDole · 1d ago
>If he didn't understand the question how could he know the model answered it perfectly?
Also, 'thing that I don't know about but is broadly and uncontroversially known and documented by others' is sort of dead center of the value proposition for current-generation LLMs and also doesn't make very impressive marketing copy.
Unless he's saying that he fed it an unknown-to-experts-in-the-field question and it figured it out in which case I am very skeptical.
zuminator · 1d ago
There's plenty of things to roast Altman about, but this isn't really one of them. A specialized problem might not be understood by someone unversed in that field even if the solution is simple and knowable. "What is the Euler characteristic of a torus?" for a rudimentary off-the-cuff example. Altman could easily know or check the answer ("0") without ever understanding the meaning of the question.
asadotzler · 1d ago
There's nothing new in your claim (or his) that wasn't 100% the case with GPT4. This is Altman's brag on GPT5, not generative AI in general, so it's gotta say something it couldn't about GPT4 or it's just bullshit filler.
dragonwriter · 1d ago
It takes a really special kind of self-delusion to recognize that you don't understand the question and also think you are qualified to evaluate the answer.
polotics · 23h ago
I can only assume that as part of the GPT's answer came a thorough explanation of the question, which meant that dear Sam got first to understand his question, and then could read further to see that the answer was good. One can dream, or at least that's what he wants us to do.
andsoitis · 1d ago
Couldn’t someone who does understand it verify for you?
Insanity · 1d ago
Maybe, after thinking for a really long time, GPT-5 said "42". The answer might have been so shocking to Altman that now he'll have to build GPT-6.
But more seriously - it's a ridiculous statement to think you understand the answer when you don't understand the question in the first place..
ml-anon · 1d ago
World class grifter grifting hard
MaxPock · 1d ago
Evolution or revolution? They’d better deliver, as Gemini has been hogging all the attention and open source models are fast catching up.
Insanity · 1d ago
I use Gemini at work, and ChatGPT for everything else 'personal'. Also from my non-tech friends, I usually hear them talk about ChatGPT and not Gemini or other models. I think, even if Gemini outperforms ChatGPT, there's definitely a strong 'first mover advantage' at play. I suspect that being outperformed by Gemini etc won't diminish their market share significantly.
sequin · 22h ago
It seems that at least in software engineering, Claude is popular and people are shoveling tons of money into it, whereas non-tech people are not inclined to pay for ChatGPT. Market share might not be so important if your competitor is raking in all the cash.
caseyf7 · 23h ago
Are they losing so many customers to Anthropic, Gemini, etc that they have to pre-announce this?
manishsharan · 1d ago
Are they going to require my DNA and blood samples to access this ?
Screw their organization verification. I am taking my business to Claude or Deepseek.
andrewstuart · 1d ago
There’s so much work to be done developing coding related tools that integrate AI and traditional coding analysis and debugging tools.
Also programming needs to be redesigned from the ground up as LLM first.
bgribble · 1d ago
I am still skeptical about the value of LLM as coding helper in 2025. I have not dedicated myself to an "AI first" workflow so maybe I am just doing it wrong.
The most positive metaphor I have heard about why LLM coding assistance is so great is that it's like having a hard-working junior dev that does whatever you want and doesn't waste time reading HN. You still have to check the work, there will be some bad decisions in there, the code maybe isn't that great, but you can tell it to generate tests so you know it is functional.
OK, let's say I accept that 100% (I personally haven't seen evidence that LLM assistance is really even up to that level, but for the sake of argument). My experience as a senior dev is that adding juniors to a team slows down progress and makes the outcome worse. You only do it because that's how you train and mentor juniors to be able to work independently. You are investing in the team every time you review a junior's code, give them advice, answer their questions about what is going on.
With an LLM coding assistant, all the instruction and review you give it is just wasted effort. It makes you slower overall and you spend a lot of time explaining code and managing/directing something that not only doesn't care but doesn't even have the ability to remember what you said for the next project. And the code you get out, in my experience at least, is pretty crap.
I get that it's a different and, to some, interesting way of programming-by-specification, but as far as I can tell the hype about how much faster and better you can code with an AI sidekick is just that -- hype. Maybe that will be wrong next year, maybe it's wrong now with state-of-the-art tools, but I still can't help thinking that the fundamental problem, that all the effort you spend on "mentoring" an LLM is just flushed down the toilet, means that your long term team health will suffer.'
mixologic · 1d ago
> And the code you get out, in my experience at least, is pretty crap
I think that belies the fundamental misunderstanding of how AI is changing the goalposts in coding
Software engineering has operated under a fundamental assumption that code quality is important.
But why do we value the "quality" of code?
* It's easier for other developers (including your future self) to understand, and easier to document.
* Easier to change when requirements change
* More efficient with resources, performs better (cpu/network/disk)
* Easier to develop tests if its properly structured
AI coding upends a lot of that, because all of those goals presume a human will, at some point, interact with that code in the future.
But the whole purpose of coding in the first place is to have a running executable that does what we want it to do.
The more we focus on the requirements and guiding AI to write tests to prove those requirements are fulfilled, the less we have to actually care about the 'quality' of the code it produces. Code quality isn't a requirement, its a vestigal artifact of human involvement in communicating with the machine.
andrewstuart · 1d ago
An important point, well made.
Often AI critics say things like “quality is bad or “it made coding errors” or “it failed to understand a large code base”.
AI proponents and expert users understand these constraints and know that these are but actually that important.
alex1138 · 1d ago
I don't know how they get their sources, but it would be nice if it was directly from coding documentation (and not random stackoverflow answers) and if those guides were I don't know, more machine readable? (That's not a passive aggressive use of question marks, I'm genuinely just guessing here)
asadotzler · 1d ago
>programming needs to be redesigned from the ground up as LLM first.
Yes, because non-deterministic systems are great softwares. I mean, who want's repeatable execution on the control program for their nuclear submarine or their hospital lighting controls. Why would anyone want a computer capable of actual math running on the President's nuclear "football" when we can have the outputs of hallucinating tools running there.
The UI jumps and everything moves, I now have to wait until it loads. Massive UX mistake, you learn this the first week you make websites ...
I was surprised at the number and calibre of orgs that came to me who would basically say anything I wanted for cash, this opened my eyes a lot and made me very suspicious of published media.
People get excited on an update in such a rapidly changing space. It really is that simple.
My theory is that LLMs will get commoditized within the next year. The edge that OpenAI had over the competition is arguably lost. If the trend continues we will be looking at inference like commodity prices, where the most efficient like cerebras and groq will be the only ones actually making money at the end.
If it were good, there would also be no need for their devs to tell people to temper their expectations, alas... [0]
[0] https://xcancel.com/alexwei_/status/1946477756738629827#m
Incredibly bad theory, it's like you're saying every LLM is the same because they can all talk, even though the newer ones continue to smash through benchmarks the older ones couldn't. And now it happens quarterly instead of yearly, so you can't even say it's slowing down.
Anyway, I'm sure gpt-5 will be AGI.
In my mind there are really three dimensions they can differentiate on: cost, speed, and quality. Cost is hard because they’re already losing money. Speed is hard because differentiation would require better hardware (more capex).
For many tasks, perhaps even a majority right now, quality of free models is approaching good enough.
OpenAI could create models which are unambiguously more reliable than the competition, or ones which are able to answer questions no other model can. Neither of those has happened yet afaik.
The fourth dimension is likely to be the most powerful of the differentiators: specificity.
Think Cursor or Lovable, but tailored for other industries.
There's a weird thing where engineers tend to be highly paid, but people who employ engineers are hesitant to spend highly on tools to make their engineers more productive. Hence all Cursor's magic only gets its base price to ~50% of Intercom's entry-level fee for a tool for people who do customer support.
LLMs applied to high-value industries outside of tech are going to be a big differentiator. And the companies that build such solutions will not have the giant costs associated with building the next foundation model, or potentially operating any models at all.
Yes, this is why apple famously just dumped the original iphone on the market without telling anybody about it ahead of time.
That's certainly not how the first iphone is usually described.
Ie overfit to benchmarks.
Turning to their reasoning models, it’s also known and documented through SimpleQA and PersonQA that OpenAI o3 hallucinates more than o1, and o4-mini even more than o3. There’s an unmanaged issue where training on synthetic data improves benchmark results on STEM tasks but increases hallucination rates, especially troubling OpenAI models for some reason (my guess: they’re fine-tuned to take risks since it’s known to also increase likelihood of getting it right for hard tasks?)
Google has long known OpenAI struggles with hallucinations more than them according to an anonymous Googler that I saw commented on this. This has been verified by the aforementioned benchmarks. Anthropic also struggles less. But as far as I can tell, they’re all facing issues with synthetic data acting like a double edged sword.
So GPT-5 is going to be interesting. How well it exactly does will bear a lot of meaning for the kind of trouble OpenAI is in right now. Maybe OpenAI has found a novel approach in reducing hallucinations? I think that’s among their most crucial points right now. But other than this, no, I don’t expect a revolution, only an evolution. They might currently win benchmarks, but it will hardly be something that catapults them.
If GPT-5 underwhelms, it will bear a stronger signal than merely the one that GPT-5 underwhelms. Because then OpenAI has trouble with both non-reasoning and reasoning models, and we’re likely to be looking at the end of the road on the horizon for current GPT based LLM’s and one where the winner will probably ultimately be cheaper open weight models once they catch up.
This happened because training progressively larger models used to be the main path forward, which was easy to track and name, but currently it's all about quickly incorporating synthetic data chain-of-thoughts created by flash models.
I get the impression that OpenAI will rename what's intended as o4 to gpt-5 and package it as such.
Nothing annoys me more than OpenAI acting like something is rolling out (or rolled out already) and then taking forever to do so.
> ChatGPT agent starts rolling out today to Pro, Plus, and Team; Pro will get access by the end of day, while Plus and Team users will get access over the next few days.
"Next few days" - It's 8 days later (so far). Lest one think "It's only 8 days, geez, calm down": They do this _all the time_. I don't even remember the length of the gap between announcing the enhanced voices and then forgetting about it completely before it finally rolled out.
It sours every announcement they make in my opinion.
If he didn't understand the question how could he know the model answered it perfectly?
It makes selling improvements fairly hard actually. If the last model already wrote an amazing poem about hot dogs, the English language doesn’t have superlatives to handle what the next model creates.
If we are being pedantic we could never accept "question we don't know how to answer" as a possible interpretation of "question we don't understand".
Also, 'thing that I don't know about but is broadly and uncontroversially known and documented by others' is sort of dead center of the value proposition for current-generation LLMs and also doesn't make very impressive marketing copy.
Unless he's saying that he fed it an unknown-to-experts-in-the-field question and it figured it out in which case I am very skeptical.
But more seriously - it's a ridiculous statement to think you understand the answer when you don't understand the question in the first place..
Screw their organization verification. I am taking my business to Claude or Deepseek.
Also programming needs to be redesigned from the ground up as LLM first.
The most positive metaphor I have heard about why LLM coding assistance is so great is that it's like having a hard-working junior dev that does whatever you want and doesn't waste time reading HN. You still have to check the work, there will be some bad decisions in there, the code maybe isn't that great, but you can tell it to generate tests so you know it is functional.
OK, let's say I accept that 100% (I personally haven't seen evidence that LLM assistance is really even up to that level, but for the sake of argument). My experience as a senior dev is that adding juniors to a team slows down progress and makes the outcome worse. You only do it because that's how you train and mentor juniors to be able to work independently. You are investing in the team every time you review a junior's code, give them advice, answer their questions about what is going on.
With an LLM coding assistant, all the instruction and review you give it is just wasted effort. It makes you slower overall and you spend a lot of time explaining code and managing/directing something that not only doesn't care but doesn't even have the ability to remember what you said for the next project. And the code you get out, in my experience at least, is pretty crap.
I get that it's a different and, to some, interesting way of programming-by-specification, but as far as I can tell the hype about how much faster and better you can code with an AI sidekick is just that -- hype. Maybe that will be wrong next year, maybe it's wrong now with state-of-the-art tools, but I still can't help thinking that the fundamental problem, that all the effort you spend on "mentoring" an LLM is just flushed down the toilet, means that your long term team health will suffer.'
I think that belies the fundamental misunderstanding of how AI is changing the goalposts in coding
Software engineering has operated under a fundamental assumption that code quality is important.
But why do we value the "quality" of code?
* It's easier for other developers (including your future self) to understand, and easier to document. * Easier to change when requirements change * More efficient with resources, performs better (cpu/network/disk) * Easier to develop tests if its properly structured
AI coding upends a lot of that, because all of those goals presume a human will, at some point, interact with that code in the future.
But the whole purpose of coding in the first place is to have a running executable that does what we want it to do.
The more we focus on the requirements and guiding AI to write tests to prove those requirements are fulfilled, the less we have to actually care about the 'quality' of the code it produces. Code quality isn't a requirement, its a vestigal artifact of human involvement in communicating with the machine.
Often AI critics say things like “quality is bad or “it made coding errors” or “it failed to understand a large code base”.
AI proponents and expert users understand these constraints and know that these are but actually that important.
Yes, because non-deterministic systems are great softwares. I mean, who want's repeatable execution on the control program for their nuclear submarine or their hospital lighting controls. Why would anyone want a computer capable of actual math running on the President's nuclear "football" when we can have the outputs of hallucinating tools running there.
GPT-5-reasoning alpha found in the wild
https://news.ycombinator.com/item?id=44614644