The funny thing about vibe coding is that God tier vibe coders think they're in DGAF mode. But people who are actually in DGAF mode and just say "Make instagram for me" think they're in god tier.
But agreed, there needs to be a better way for these agents to figure out what context to select. It doesn't seem like this will be too much of a large issue to solve though?
Workaccount2 · 1h ago
I don't know why software engineers think that LLM coding ability is purpose made for them to use, and because it sort of sucks at it, it therefore useless...
It's like listening to professional translators endlessly lament about translation software and all it's short comings and pitfalls, while totally missing that the software is primarily used for property managers wanting to ask the landscapers to cut the grass lower.
LLMs are excellent at writing code for people who have no idea what a programming language is, but a good idea of what computers can do when someone can speak this code language to them. I don't need an LLM to one-shot Excel.exe so I can track the number of members vs non-members who come to my community craft fair.
jmward01 · 1h ago
This is definitely the right problem to focus on. I think the answer is a different LLM structure that has unlimited context. The transformer with causal masks for training block got us here but they are now limiting us in many massive ways.
artembugara · 1h ago
What are some startups that help precisely with “feeding the LLM the right context” ?
__mharrison__ · 2h ago
Building complex software is certainly possible with no coding and minimal promoting.
The article summed itself up as 'Context is everything".
But the article itself also makes the point that a human assistant was also necessary. That's gonna be my take away.
summarity · 3h ago
I found the same in my personal work. I have o3 chats (as in OAI's Chat interface) that are so large they crash the site, yet o3 still responds without hallucination and can debug across 5k+ LOC. I've used it for DSP code, to debug a subtle error in a 800+LOC Nim macro that sat in a 4k+ LOC module (it found the bug), work on compute shaders for audio analysis, work on optimizing graphics programs and other algorithms. Once I "vibe coded" (I hate that term) a fun demo using a color management lib I wrote, which encoded the tape state for a brainfuck interpreter in the deltaE differences between adjacent cells. Using the same prompts replayed in Claude chat and others doesn't even get close. It's spooky.
Yet when I use the Codex CLI, or agent mode in any IDE it feels like o3 regresses to below GPT-3.5 performance. All recent agent-mode models seem completely overfitted to tool calling. The most laughable attempt is Mistral's devstral-small - allegedly the #1 agent model, but going outside of scenarios you'd encounter in SWEbench & co it completely falls apart.
I notice this at work as well, the more tools you give any model (reasoning or not), the more confused it gets. But the alternative is to stuff massive context into the prompts, and that has no ROI. There's a fine line to be walked here, but no one is even close it yet.
But agreed, there needs to be a better way for these agents to figure out what context to select. It doesn't seem like this will be too much of a large issue to solve though?
It's like listening to professional translators endlessly lament about translation software and all it's short comings and pitfalls, while totally missing that the software is primarily used for property managers wanting to ask the landscapers to cut the grass lower.
LLMs are excellent at writing code for people who have no idea what a programming language is, but a good idea of what computers can do when someone can speak this code language to them. I don't need an LLM to one-shot Excel.exe so I can track the number of members vs non-members who come to my community craft fair.
This YT video (from 2 days ago) demonstrates it https://youtu.be/fQL1A4WkuJk?si=7alp3O7uCHY7JB16
The author builds a drawing app in an hour.
But the article itself also makes the point that a human assistant was also necessary. That's gonna be my take away.
Yet when I use the Codex CLI, or agent mode in any IDE it feels like o3 regresses to below GPT-3.5 performance. All recent agent-mode models seem completely overfitted to tool calling. The most laughable attempt is Mistral's devstral-small - allegedly the #1 agent model, but going outside of scenarios you'd encounter in SWEbench & co it completely falls apart.
I notice this at work as well, the more tools you give any model (reasoning or not), the more confused it gets. But the alternative is to stuff massive context into the prompts, and that has no ROI. There's a fine line to be walked here, but no one is even close it yet.