Focus and Context and LLMs

95 tarasglek 48 6/8/2025, 9:09:19 AM taras.glek.net ↗

Comments (48)

tptacek · 32d ago

This article is knocking down a very expansive claim that most serious (ie: not vibe-coding) developers aren't making. Their point is that LLM agents have not yet reached the point where they can finish a complicated job end-to-end, and that if you want to do a completely hands-off project, where only the LLM generates any code, it takes a lot of prompting effort to accomplish.

This seems true, right now!

But in building out stuff with LLMs, I don't expect (or want) them to do the job end-to-end. I've ~25 merged PRs into a project right now (out of ~40 PRs generated). Most merged PRs I pulled into Zed and cleaned something up. At around PR #10 I went in and significantly restructured the code.

The overall process has been much faster and more pleasant than writing from scratch, and, notably, did not involve me honing my LLM communications skills. The restructuring work I did was exactly the same kind of thing I do on all my projects; until you've got something working it's hard to see what the exact right shape is. I expect I'll do that 2-3 more times before the project is done.

I feel like Kenton Varda was trying to make a point in the way they drove their LLM agent; the point of that project was in part to record the 2025 experience of doing something complicated end-to-end with an agent. That took some doing. But you don't have to do that to get a lot of acceleration from LLMs.

ofjcihen · 32d ago

It’s almost like unrealistic expectations of LLMs driven by those working for companies who have something to gain by labeling any skepticism as “crazy” does significant damage to our perception of it’s usefulness.

Believe it or not I agree.

tptacek · 32d ago

I'm sorry, I read this comment like 3 times and I still don't understand what it's trying to say. Who are the companies you're talking about and are they too positive on LLMs or too negative?

ofjcihen · 32d ago

Just that blind fanaticism leads to things like constant goal post moving when the product doesn’t live up to the hype. This damages people’s perception of the tool and causes them to be burnt out on it when it isn’t in fact magic.

Instead we should be accepting that people will or wont find uses for it depending on their competency (CRUD app churn VS somewhat novel creations) and accept that without telling them they’re nuts, luddites, etc.

Then again like I said the people doing that usually have something to gain such as a product related to the hype generating product.

Here’s an example article that hit the front page for HN this week https://fly.io/blog/youre-all-nuts/

tptacek · 32d ago

I wrote that article, and I don't believe "any" skepticism of "AI" is nuts. As an existence proof: I think "vibe coding" produces results as bad as skeptics say it does. The article is pretty specific about what it says the nutty claims are.

gsf_emergency · 32d ago

Ahh I finally get it: you were claiming you only hang out with nuts (when it comes to AI)

--the link itself says "you", but that's also addressing your friends I presume?

Edit: the politicians too?

>To the consternation of many of my friends, I’m not a radical or a futurist.

(Apologies if you were tipsy at any point in the relevant parts)

tptacek · 32d ago

I'm sorry, I really can't follow any of this. No two lines of this seem to go together. "The politicians"?

gsf_emergency_2 · 32d ago

The many of your friends who are into radicalism or futurism

The link: fly.io/blog/YOUre-all-nuts/

(XD)

tptacek · 32d ago

Sorry, not getting me any closer to understanding this. The post took something like a month to put together.

gsf_emergency_2 · 32d ago

No problem, it's probably just my schizo talking.

It's just that... Framing your post in terms of the opinions of devs that you personally know, and not those of the AI-assistance community at large* resolved your issue with ofjcihen. for me (only, it seems :().

*explicitly including e.g., Lisp, Haskell (radical futurists?),

& (dare I mention) SwiftUI devs

ofjcihen · 32d ago

Hah. Perfect. Then you’ll also agree with my statement about differing levels of acceptance I assume?

tptacek · 32d ago

Could you restate it? My whole point in this thread is that the article is knocking down an argument many supporters of "AI coding" aren't making. As I've just demonstrated, it's easy to find a skeptical argument that this "supporter" agrees with.

ofjcihen · 32d ago

I mean I could but I think a better way to get to the heart of this discussion is to ask why you put quotations around supporter when describing the author.

Do you think he’s secretly against the tool itself or do you acknowledge that maybe the tool just doesn’t work for him and his use case and maybe he’s not nuts for finding fault with it?

tptacek · 32d ago

I wasn't referring in any particular way to the author, just acknowledging that there are multiple kinds of supporter.

threeseed · 32d ago

The plural of anecdote is not data.

Let's repeat this process for 100 coding examples and see how many it can complete "hands-off" especially where (a) it isn't a case of here is a spec and I need you to implement it and (b) it isn't for a a use for which there is already publicly available code.

Otherwise your claim of "this seems true, right now!" is baseless.

tptacek · 32d ago

I can't tell if you're saying I'm being too generous towards LLMs or too skeptical.

jjfoooo4 · 31d ago

I think more specifically, if you don't have strong guidance to the LLM on how a project is organized and where things should go, things go pretty far off the rails.

I tried to have an LLM fully write a Python library for me. I wrote an extensive, detailed requirements doc, and it looked close enough that I accepted the code. But as I read through it more closely, I realized it was duplicating code in confusing ways, and overall it took longer than just getting the bones down myself, first.

Some coding agents are now more actively indexing code, I think that should help with this problem

summarity · 32d ago

I found the same in my personal work. I have o3 chats (as in OAI's Chat interface) that are so large they crash the site, yet o3 still responds without hallucination and can debug across 5k+ LOC. I've used it for DSP code, to debug a subtle error in a 800+LOC Nim macro that sat in a 4k+ LOC module (it found the bug), work on compute shaders for audio analysis, work on optimizing graphics programs and other algorithms. Once I "vibe coded" (I hate that term) a fun demo using a color management lib I wrote, which encoded the tape state for a brainfuck interpreter in the deltaE differences between adjacent cells. Using the same prompts replayed in Claude chat and others doesn't even get close. It's spooky.

Yet when I use the Codex CLI, or agent mode in any IDE it feels like o3 regresses to below GPT-3.5 performance. All recent agent-mode models seem completely overfitted to tool calling. The most laughable attempt is Mistral's devstral-small - allegedly the #1 agent model, but going outside of scenarios you'd encounter in SWEbench & co it completely falls apart.

I notice this at work as well, the more tools you give any model (reasoning or not), the more confused it gets. But the alternative is to stuff massive context into the prompts, and that has no ROI. There's a fine line to be walked here, but no one is even close it yet.

artembugara · 32d ago

What are some startups that help precisely with “feeding the LLM the right context” ?

jsemrau · 32d ago

Is that really a product? I think it should be solved through workflow and policies rather than providing this to a 3rd party provider. But I might be wrong.

[1] https://jdsemrau.substack.com/p/memory-and-context

tough · 31d ago

Not a startup, and doesnt help you still have to choose, but I paid 200usd for RepoPrompt (macOs app)

its a very niche app, and havent used it as much since buying it, but there's that https://repoprompt.com/

drekipus · 32d ago

Anthropic

apwell23 · 32d ago

cursor ?

jmward01 · 32d ago

This is definitely the right problem to focus on. I think the answer is a different LLM structure that has unlimited context. The transformer with causal masks for training block got us here but they are now limiting us in many massive ways.

briian · 32d ago

The funny thing about vibe coding is that God tier vibe coders think they're in DGAF mode. But people who are actually in DGAF mode and just say "Make instagram for me" think they're in god tier.

But agreed, there needs to be a better way for these agents to figure out what context to select. It doesn't seem like this will be too much of a large issue to solve though?

jumploops · 32d ago

Has the author tried Claude Code?

It’s the first useful “agent” (LLM in a loop + tools) that I’ve tried.

IME it is hard to explain why it’s better than e.g. Aider or Cursor, but once you try it you’ll migrate your workflow pretty quickly.

troupo · 32d ago

It can get surprisingly dumb surprisingly fast.

Today I spent easily half an hour trying to make it solve a layout issue it itself introduced when porting a component.

It was a complex port it executed perfectly. And then it completely failed to even create a simple wrapper that fixed a flexbox issue.

BTW. Claude (Code and Cursor) is over-indexed on "let's randomly add and remove h-full/overflow-auto and pretend it works ad infinitum"

bob1029 · 32d ago

I've found that CSS is among one of the more terrible things for an LLM to work on.

It's definitely on point with some strategic layout items, flexbox, etc., but when it comes to anything like colors, margins, padding, typeface, borders, etc., you might as well be throwing darts into the void.

apwell23 · 32d ago

> And then it completely failed to even create a simple wrapper that fixed a flexbox issue.

yea this is the problem with vibe coding. its hard to understand and keep tabs on nitty gritty when stuff is being generated for you. No matter how much you 'review' it, it just doesn't stick in the same way if you were writing code. You are really screwed if you have debug something that llm throws its hands up on.

padolsey · 32d ago

How much transparency does Claude Code give you into what it's doing? I like IDE-integrated agents as they show diffs and allow focused prompting for specific areas of concern. And I get to control what's in context at any given time in a longer thread. I haven't tried Claude's thing in a while, but from what I gather it's more of a "prompt and pray" kind of agent.. ?

_neil · 32d ago

My experience is that you can be very targeted in your promoting with Claude code and it mostly gets good results. You can also ask it early on to create a branch and create logical commits as it works. This way, you can examine smaller code changes later in a PR (or git log).

Or if you want to work more manually, you could do the same but not allow full access to git commit. That way it will request access each time it’s ready to commit and you can take that time to review diffs.

lojack · 32d ago

Definitely not a prompt and pray type thing, though you can do that if you choose. It shows its work, in the newest version there's three modes (planning, executing, auto accepting edits). You can also hit Esc at any time to redirect as you see it going in the wrong direction.

apwell23 · 32d ago

> IME it is hard to explain why it’s better than e.g. Aider or Cursor

i have cursor through work but i am tempted to shell out $100 because of this hype.

is it better than using claude models in cursor?

cap11235 · 32d ago

You could always just try it out via a normal Anthropic API key. Probably would be around $3-10 for greenfield implementation of a non-trivial project. "/cost" to see as you go.

quantum_state · 32d ago

Context is all you need :-)

tarasglek · 32d ago

Indeed, that was my original working title

max2he · 32d ago

bruh that's googles original working title

emorning3 · 32d ago

The article summed itself up as 'Context is everything".

But the article itself also makes the point that a human assistant was also necessary. That's gonna be my take away.

spmurrayzzz · 32d ago

I agree. And the real lede was buried here IMO:

> This is the single most impressive code-gen project I’ve seen so far. I did not think this was possible yet.

To get that sort of acclaim, a human had to build an embedded programming language from scratch to get to that point. And even with all that effort, the agent itself took $631 and 119 hours to complete the task. I actually don't think this is a knock on the idea at all, this is the direction I think most engineers should be thinking about.

That agent-built HTTP/2 server they're referencing is apparently the only example of this sort of output they've seen to date. But if you're active in this particular space, especially on the open source side of the fence, this kind of work is everywhere. But since they don't manifest themselves as super generic tooling that you can apply to broad task domains as a turnkey solution, they don't get much attention.

I've continually held the line that if any given LLM agent platform works well for your use case and you haven't built said agent platform yourself, the underlying problem likely isn't that hard or complex. For the hard problems, you gotta do some first-principles engineering to make these tools work for you.

gaeb69 · 29d ago

Do you have any of those examples handy by chance? Curious to check them out. And agreed! While coding has become a commodity - programming is still as alive as ever.

spmurrayzzz · 28d ago

Absolutely agree with that last sentiment. Recently I came across a project by a former google engineer called Dyad: https://github.com/dyad-sh/dyad

This is built to be a lovable/bolt alternative and is definitely on the early side in terms of total capability and reliability. But once you start digging through the source you realize how much engineering actually went into building it. Not just chaining prompts together in a dart throwing exercise and praying for a good result.

This is much closer to the "turnkey" solution vertical I mentioned in my earlier commentary, since its meant to generically build any web app, but there's a few applied concepts that are shared with the promptyped approach used in the HTTP/2 server (though not as sophisticated when compared to the category theory / type theory approach).

I think it's a good example to work backwards from though, if you peel the onion a bit you realize how much more tightly you could scope this for more bespoke projects.

mathgeek · 32d ago

> In the meantime continue expecting mediocre results from mediocre people feeding LLMs mediocre context.

I can't even with the ego here. The best teachers practice humility.

__mharrison__ · 32d ago

Building complex software is certainly possible with no coding and minimal promoting.

This YT video (from 2 days ago) demonstrates it https://youtu.be/fQL1A4WkuJk?si=7alp3O7uCHY7JB16

The author builds a drawing app in an hour.

Workaccount2 · 32d ago

I don't know why software engineers think that LLM coding ability is purpose made for them to use, and because it sort of sucks at it, it therefore useless...

It's like listening to professional translators endlessly lament about translation software and all it's short comings and pitfalls, while totally missing that the software is primarily used for property managers wanting to ask the landscapers to cut the grass lower.

LLMs are excellent at writing code for people who have no idea what a programming language is, but a good idea of what computers can do when someone can speak this code language to them. I don't need an LLM to one-shot Excel.exe so I can track the number of members vs non-members who come to my community craft fair.

Nevermark · 32d ago

> LLMs are excellent at

Writing hint: Your last paragraph stands well on its own. Especially if this is, in fact, your actual experience.

Nothing in that paragraph requires the negativity or inaccuracies of the preceding two paragraphs.

There should be a name for the human tendency (we have all done/do it) to weigh down good points with unnecessary and often inaccurate contrast/competition.

abletonlive · 32d ago

I struggle with this a lot so thanks for the hint. I don't entirely agree that good points stand on their own. It's often easy to anticipate the criticism to your point. When arguments are text-based and responses have a tendency to span hours or days, it can be useful to short circuit the argument by just calling out the anticipated criticism. This of course can sometimes lead to comments such as yours, where we go off into a meta side-argument. Or, if you anticipate badly, you unintentionally put even more focus on the anticipated criticism than your original point.

Nevermark · 31d ago

> It's often easy to anticipate the criticism to your point.

You can head off a lot of criticism by not making your point competitive with other reasonable points. I.e. additive to understanding, not subtractive.

Otherwise, you are actually creating the competition between points that you wanted to avoid. And creating your own distractions from your own point.

abletonlive · 19d ago

But ideas are not always additive. It's okay that ideas are competitive and it's okay to try and shut down bad ideas. Easy to find examples. If someone was proposing to extinguish a group of people, would you try to "add" to the conversation?

Show HN: Pangolin – Open source alternative to Cloudflare Tunnels (github.com)

Show HN: Interactive pinout for the Raspberry Pi Pico 2 (pico2.pinout.xyz)

Show HN: Cactus – Ollama for Smartphones (github.com)

Show HN: I built a playground to showcase what Flux Kontext is good at (fluxkontextlab.com)

Show HN: Open source alternative to Perplexity Comet (browseros.com)

Show HN: Code is all you need – Sherlog MCP (github.com)

Show HN: CXXStateTree – A modern C++ library for hierarchical state machines (github.com)

Show HN: asyncmcp – Run MCP over async transport via AWS SNS+SQS (github.com)

Show HN: Typeform was too expensive so I built my own forms (ikiform.com)

Show HN: FlopperZiro – A DIY open-source Flipper Zero clone (github.com)

Show HN: MCP server for searching and downloading documents from Anna's Archive (github.com)

Show HN: OffChess – Offline chess puzzles app (offchess.com)

Show HN: BreakerMachines – Modern Circuit Breaker for Rails with Async Support (github.com)

Show HN: I just deployed GovDocs – which use AI to make SA gov docs searchable (govdocs.co.za)

Show HN: Petrichor – a free, open-source, offline music player for macOS (github.com)

Show HN: Ten years of running every day, visualized (nodaysoff.run)

Show HN: Cursor Rules Generator (cursor-rules-generator.xyz)

Show HN: NYC Subway Simulator and Route Designer (buildmytransit.nyc)

Show HN: Virby, a vfkit-based Linux builder for Nix-Darwin (github.com)

Show HN: Activiews – A privacy-first fitness alternative for Apple users (activiews.xyz)

Show HN: I rewrote an outdated React Native map clustering library (github.com)

Show HN: Jukebox – Free, Open Source Group Playlist with Fair Queueing (jukeboxhq.com)

Show HN: Endorphin AI–Run browser E2E tests from plain English with QA AI agent (endorphinai.dev)

Show HN: A rain Pomodoro with brown noise, ASMR, and Middle Eastern music (forgetoolz.com)

Show HN: A Chrome Extension to Reveal SaaS Sprawl, Shadow IT, and Waste (hapstack.com)

Show HN: Pg-when– psql extension for creating time values with natural language (github.com)

Show HN: I wrote a "web OS" based on the Apple Lisa's UI, with 1-bit graphics (alpha.lisagui.com)

Show HN: I rebuilt few years old project and now it covers my expenses (hextaui.com)

Show HN: I Built an AI Agent Ecosystem That Optimises Your Google Ads for You (groas.ai)

Show HN: I built an app to turn my kids' questions into podcasts (wonderpods.app)

Show HN: From Photos to Positions: Prototyping VLM-Based Indoor Maps (arjo129.github.io)

Show HN: Modernized file manager and program manager from Windows 3.x (github.com)

Show HN: Stravu – Editable, multi-player AI notebooks with text, tables, diagram

Show HN: I made a tool to make LinkedIn posting feel less like a chore (linkgenie.one)

Show HN: Ossia score – A sequencer for audio-visual artists (github.com)

Show HN: Pyhoff – Connect Python ML Models to Beckhoff/WAGO IO Hardware (github.com)

Show HN: Piano Trainer – Learn piano scales, chords and more using MIDI (github.com)

Show HN: Unlearning Comparator, a visual tool to compare machine unlearning (gnueaj.github.io)

Show HN: I built a tool to solve window management (aboveaverageuser.com)

Show HN: A Language Server Implementation for SystemD Unit Files (github.com)

Show HN: Still coding in VC++ 6.0 after losing everything, living in a trailer

Show HN: Sumble – knowledge graph for GTM data – query tech stack, key projects (sumble.com)

Show HN: AirBending – Hand gesture based macOS app MIDI controller (nanassound.com)

Show HN: BunkerWeb – the open-source and cloud-native WAF (docs.bunkerweb.io)

Show HN: a community for collaborating on sideprojects (relentlessly.no)

Show HN: I AI-coded a tower defense game and documented the whole process (github.com)

Show HN: KCast (github.com)

Show HN: Todo2 – AI Project Manager Inside Cursor (todo2.pro)

Show HN: RecomPal – A no-code AI chatbot to increase Shopify sales (recompal.com)

Show HN: HomeBrew HN – Generate personal context for content ranking (hackernews.coffee)

Focus and Context and LLMs

Comments (48)