Show HN: BaaS to build agents as data, not code (github.com)
5 points by ishita159 1d ago 0 comments
Show HN: Bringing Tech News from HN to My Community (sh4jid.me)
3 points by sh4jid 1d ago 2 comments
OpenAI's new GPT-5 models announced early by GitHub
78 bkolobara 72 8/7/2025, 8:06:48 AM theverge.com ↗
I find it interesting how marketers are trying to make minimal prompting a good thing, a direction to optimize. Even if i talk to a senior engineer, i'm trying to be specific as possible to avoid ambiguities etc. Pushing the models to just do what they think its best is a weird direction. There are so many subtle things/understandings of the architecture that are just in my head or a colleagues head. Meanwhile, i found that a very good workflow is asking claude code to come back with clarifying questions and then a plan, before just starting to execute.
RooCode supports various modes https://docs.roocode.com/basic-usage/using-modes
For example, you can first use the Ask mode to explore the codebase and answer your questions, as well as ask you its own about what you want to do. Then, you can switch over to the Code mode to do the actual implementation, or the model itself will ask you to switch to it in other modes, because it's not allowed to change files in the Ask mode.
I think that approach works pretty well, especially when you document what needs to be done in a separate Markdown file or something along the lines of it, that can be then referenced if you have to clean the context, like a new refactoring task for what's been implemented.
> I find it interesting how marketers are trying to make minimal prompting a good thing, a direction to optimize.
This seems like a good thing, though. You're still allowed to be as specific as you want to, but the baseline is a bit better.
They do that because IMHO the average person seems to prefer something to be easy, rather than correct.
Sure - but you're being specific about the acceptance criteria, not the technical implementation details, right?
That's where the models I've been using are at the moment in terms of capability; they're like junior engineers. They know how to write good quality code. If I tell them exactly what to write, they can one-shot most tasks. Otherwise, there's a good chance the output will be spaghetti.
> There are so many subtle things/understandings of the architecture that are just in my head or a colleagues head.
My primary agentic code generation tool at the moment is OpenHands (app.all-hands.dev). Every time it makes an architectural decision I disagree with, I add a "microagent" (long-term context, analogous to CLAUDE.md or Devin's "Knowledge Base").
If that new microagent works as expected, I incorporate it into either my global or organization-level configs.
The result is that it gets more and more aligned with the way I prefer to do things over time.
There is a definite skill gap between folks who are using these tools effectively and those who do not.
There will always be people that describe a problem, and you'll always need people actually figuring out what's actually wrong.
Watch the company fire 50% of the engineering team then hit a brick wall at 100mph.
I wouldn't say they're completely incapable.
* They can spot (and fix) low hanging fruit instantly
* They will also "fix" things that were left out there for a reason and break things completely
* even if the code base fits entirely in their context window, as does the complete company knowledge base, including Slack conversations etc., the proposed solutions sometimes take a very strange turn, in spite of being correct 57.8% of the time.
Today's AI systems are the worst they'll ever be. If AI is already capable of doing something, you should expect it to become more capable of it in the future.
By now, the main reason people expect AI progress to halt is cope. People say "AI progress is going to stop, any minute now, just you wait" because the alternative makes them very, very uncomfortable.
(NB I'm a very rational person and based on my lifelong experience and on how many times life surprised me both negatively and positively, I'd say the chance of a great breakthrough occurring short term is 50%, but it has nothing to do or cannot be extrapolated from the current development as this can go any way actually. We already had multiple AI winters and I'm sure humanity will have dozens if not hundreds of them still.)
Are you disappointed that there's no sudden breakthrough that yielded an AI that casually beats any human at any task? That human thinking wasn't obsoleted overnight? That may or may not happen yet. But a "slow" churn of +10% performance upgrades results in the same outcome eventually.
There's only this many "+10% performance upgrades" left between ChatGPT and the peak of human capabilities, and the gap is ever diminishing.
OK, so where is the new data going to come from? Fundamentally, LLMs work by doing token prediction when some token(s) are masked. This process (which doesn't require supervision hence why it scaled) seems to be fundamental to LLM improvement. And basically all of the AI companies have slurped up all of the text (and presumably all of the videos) on the internet. Where does the next order of magnitude increase in data come from?
More fundamentally, lots of the hype is about research/novel stuff which seems to me to be very, very difficult to get from a model that's trained to produce plausible text. Like, how does one expect to see improvements in biology (for example) based on text input and output.
Remember, these models don't appear to reason much like humans, they seem to do well where the training data is sufficient (interpolation) and do badly where there isn't enough data (extrapolation).
I'd love to understand how this is all supposed to change, but haven't really seen much useful evidence (i.e. papers and experiments) on this, just AI CEOs talking their book. Happy to be corrected if I'm wrong.
Look at Claude Code. Unless they hacked into private GitHub/GitLab repos... (which, honestly, I wouldn't put beyond these tech CEO's, see what CloudFlare recently found out about Perplexity as an example), but unless they really did that, they trained Claude 4 on approximately the same data as Claude 3. Yet for some reason its agentic coding skills are stupidly enhanced when compared to previous iterations.
Data no longer seems to be the bottleneck. Which is understandable. At the end of the day, data is really just a way to get the AI to make a predicion and run gradient descent on it. If you can generate for example a bunch of unit tests, you can let the AI freewheel its way into getting them to pass. A kid learns to catch a baseball not by seeing a million examples of people catching balls, but instead by testing their skills in the real world, and gathering feedback from the real world on whether their attempt to catch the ball was successful. If an AI can try to achieve goals and assess whether or not its actions lead to a successful or a failed attempt, who needs more data?
The other fundamental bottleneck is compute. Moore's law hasn't gone away, so if the LLM was GPT-3, and used 1 supercomputer's worth of compute for 3 months back in 2022, and the supercomputer used for training is, say, three times more powerful (3x faster CPU and 3x the RAM), then training on a latest generation supercomputer should lead to a more powerful LLM simply by virtue of scaling that up and no algorithmic changes. The exact nature of the improvement isn't easily back of the envelope calculatable, but even with a laymen's understanding of how these things work, that doesn't seem like an unreasonable assumption on how things will go, and not "AI CEOs talking their book". Simply running with a bigger context window should allow the LLM to be more useful.
Finally though, why do you assume that, absent papers up on arvix, that there haven't and won't be any algorithmic improvements to training and inference? We've already seen how allowing the LLM to take longer to process the input (eg "ultrathink" to Claude) allows for better results. It seems unlikely that all possible algorithmic improvements have already been discovered and implemented. Just because OpenAI et Al aren't writing academic papers to share their discovery with the world and are, instead, preferring to keep that improvement private and proprietary, in order to try and gain a competitive edge in a very competitive business seems like a far more reasonable assumption. With literal billions of dollars on the line, would you spend your time writing a paper, or would you try and outcompete your competitors? If simply giving the LLM longer to process the input before user facing output is returned, what other algorithmic improvements on the inference side on a bigger supercomputer with more ram available to it are possible? Deepseek seems to say there's a ton of optimization still as of yet to be done.
Happy to hear opposing points of view, but I don't think any of the things I've theorized here to be totally inconceivable. Of course there's a discussion to be had about diminishing returns, but we'd need a far deeper understanding is the state of the art on all three facets I raised in order to have an in depth and practical discussion on the subject. (Which tbc I'm open to hearing, though the comments section on HN is probably not the platform to gain said deeper understanding of the subject at hand).
Unlocking better sample efficiency is algorithmically hard and computationally expensive (with known methods) - but if new high quality data becomes more expensive and compute becomes cheaper, expect that to come into play heavily.
"Produce plausible text" is by itself an "AGI complete" task. "Text" is an incredibly rich modality, and "plausible" requires capturing a lot of knowledge and reasoning. If an AI could complete this task to perfection, it would have to be an AGI by necessity.
We're nowhere near that "perfection" - but close enough for LLMs to adopt and apply many, many thinking patterns that were once exclusive to humans.
Certainly enough of them that sufficiently scaffolded and constrained LLMs can already explore solution spaces, and find new solutions that eluded both previous generations of algorithms and humans - i.e. AlphaEvolve.
Can you damage existing capabilities by overly specializing an AI in something? Yes. Would you expect that damage to stick around forever? No.
OpenAI damaged o3's truthfulness by frying it with too much careless RL. But Anthropic's Opus 4 proves that you can get similar task performance gains without sacrificing truthfulness. And then OpenAI comes back swinging with an algorithmic approach to train their AIs for better truthfulness specifically.
The next round of data is partially AI generated what leads to further deterioration
Even a broken clock is right two times a day.
The question is reliability.
What worked today may not work tomorrow and vice versa.
Like when a relationship is obviously over. Some people enjoy the ending fleeting moments while others delude themselves that they just have to get over the hump and things will go back to normal.
I suspect a lot of the denial is from the 30 something CRUD app lottery winner. One of the smart kids all through school, graduated into a ripping CRUD app job market and then if they didn't even feel the 2022 downturn, they now see themselves as irreplaceable CRUD app genius. Something understandable since the environment has never signaled anything to the contrary until now.
I'm a systems/embedded/GUI dev with 25 years of C++ etc., and nearly every day I'm happy and grateful to be the last generation to get really proficient before AI tools made us all super dependant and lazy.
Don't get me wrong, I'm sure people will find other ways to remain productive and stand out from each other - just a new normal -, but I'm still glad all that mental exercise and experience can't be taken away from me.
I'm more compelled to figure out how I can contribute to making sure younger colleagues learn all the right stuff and treat their brains with self-respect than I feel any need to "save my own ass" or have any fears about the job changing.
-Reasoning, which is just very long inference coupled with RL
-Tool use aka an LLM with glue code to call programs based on its output
-"Agents" aka LLMs with tools in a loop
Those are pretty neat tricks, and not at all trivial to get actionable results from (from an engineering point of view), mind you. But the days of the qualitative intelligence leaps from GPT-2 to 3, or 3 to 4, are over. Sure, benchmarks do get saturated, but at incredible cost and forcing AI researchers to make up new "dimensions of scaling" as the ones they were previously banking on stalled. And meanwhile it's all your basic next token prediction blob running it all, just with a few optimizing tricks.
My hunch is that there won't be a wondorous life turning AGI (poorly defined anyway), just consolidating existing gains (distillation, small language models, MoE, quality datasets, etc.) and finding new dimensions and sources of data (biological data and 'sense-data' for robotics come to mind).
Well, the problem is that the expectations are already massive, mostly thanks to sama's strategy of attracting VC.
e.g. if you look at Altman's blog of "superintelligence in a few thousand days", what he actually wrote doesn't even disagreeing with LeCun (famously a nay-sayer) about the timeline.
I doubt it can even beat opus 4.1
this seems to be directly targeted at anthropic/claude, wonder if it leads anywhere or if claude keeps it's mystical advantage (especially with new claude models coming out this week as well).
> GPT-5 will have four model variants, according to GitHub...
i also find it interesting that the primary model is the logic-focused one (likely very long and deep reasoning), whereas the conversational mainstream model is now a variant. seems like a fundamental shift in how they want these tools to be used, as opposed to today's primary 4o and the more logical GPT-4.1, o4-mini, and o3.
> gpt-5: Designed for logic and multi-step tasks.