Ask HN: Anyone struggling to get value out of coding LLMs?
- solving tasks that just require applying knowledge ("here's a paste of my python import structure. I don't write Python often and I'm aware I'm doing something wrong here because I get this error, tell me the proper way organise the package").
- writing self-contained throwaway pieces of code ("here's a paste of my DESCRIBE TABLE output, write an SQL query to show the median [...]").
- as a debugging partner ("I can SSH to this host directly, but Ansible fails to connect with this error, what could be causing this difference").
All these use cases work great, I save a lot of time. But with the core work of writing the code that I work on, I've almost never had any success. I've tried:
- Cursor (can't remember which model, the default)
- Google's Jules
- OpenAI Codex with o4
I found in all cases that the underlying capability is clearly there (the model can understand and write code) but the end-to-end value is not at all. It could write code that _worked_, but trying to get it to generate code that I am willing to maintain and "put my name on" took longer than writing the code would have.
I had to micromanage them infinitely ("be sure to rerun the formatter, make sure all tests pass" and "please follow the coding style of the repository". "You've added irrelevant comments remove those". "You've refactored most of the file but forgot a single function"). It would take many many iterations on trivial issues, and because these iterations are slow that just meant I had to context switch a lot, which is also exhausting.
Basically it was like having an intern who has successfully learned the core skill of programming but is not really capable of good collaboration and needs to be babysat all the time.
I asked friends who are enthusiastic vibe coders and they basically said "your standards are too high".
Is the model for success here that you just say "I don't care about code quality because I don't have to maintain it because I will use LLMs for that too?" Am I just not using the tools correctly?
Those who can’t stop raving about how much of a superpower LLMs are for coding, how it’s made them 100x more productive, and is unlocking things they could’ve never done before.
And those who, like you, find it to be an extremely finicky process that requires extreme amount of coddling to get average results at best.
The only thing I don’t understand is why people from the former group aren’t all utterly dominating the market and obliterating their competitors with their revolutionary products and blazing fast iteration speed.
If I want to create a React app with X amount of pages, some Redux stores, Auth, etc. then it can smash that out in minutes. I can say "now add X" and it'll do it. Generally with good results.
But when it comes to maintaining existing systems, or adding more complicated features, or needing to know business domain details, a LLM is usually not that great for me. They're still great as a code suggestion tool, finishing lines and functions. But as far as delivering whole features, they're pretty useless once you get past the easy stuff. And you'll spend as much time directing the LLM to do this kind of this as you would just writing it yourself.
What I tend to do is write stubbed out code in the design I like, then I'll get an LLM to just fill in the gaps.
These people who say LLMs make them 100x more productive probably are only working on greenfield stuff and haven't got to the hard bit yet.
Like everyone says, the first 90% is the easy bit. The last 10% is where you'll spend most of your time, and I don't see LLMs doing the hard bit that well currently.
I have yet to see an LLM-generated app not collapse under it’s own weight after enough iterations/prompts. It gets stuck in loops (removing and adding the same code/concept), it gets hung up on simple errors, etc.
For greenfield it’s amazing, no doubt, but unless you are watching it very closely and approving/reviewing the code along the way it will go off the rails. At a certain point it’s easier to add the new feature or make the modification yourself. Even if the LLM could do it, it would burn tons of money and time.
I expect things to get better, this will not always be the state of things, but for now “vibe coding” (specifically not reviewing/writing code yourself) is not sustainable.
Most people doing it have a github profile that is a mile wide and a meter deep.
LLM’s are amazing and useful, but “vibe coding” with them is not sustainable currently.
It will not.
And I say this as someone whose been building internal LLM tools since 2021.
The issue is their context window. If you increase the context window so they can see more code costs skyrocket as n^2 the size of the code base. If you don't then you have all the issues people have in this thread.
The reason why I have a job right now is that you can get around this by building tooling for intelligent search that limits the overfill of each context window. This is neither easy, fast, or cheap when done at scale. Worse the problems that you have when doing this are at best very weakly related to the problems the major AI labs are focusing on currently - I've interviewed at two of the top five AI labs and none of the people I talked to cared or really understood what a _real_ agentic system that solves coding should look like.
Then specify the need to conclude the work by a deadline.
These kinds of things cause the LLM to “finish” tasks and try to move on or say it is done when it is.
This won’t let you skip the curation of output along the way, but at least some of the stumbling around can be handled with prompting.
Not 100x more productive, that's an exaggeration... not even 10x. But it helps. It is an extremely competent rubber duck [1].
[1] https://en.wikipedia.org/wiki/Rubber_duck_debugging
Greenfield != boilerplate and basic CRUD app.
I'm a consultant writing greenfield apps solo, and 90% of my time is spent away from my editor thinking, planning, designing, meeting with stakeholders. I see no benefit in using a low-IQ autocomplete tool to automate a small part of the remaining 10% of the job, the easiest and most enjoyable part in fact.
Personally I find most of coding I do is unsuitable for LLMs anyway, because I don't need them to regurgitate standard logic when libraries are available, so most of that 10% is writing business logic tailored for the program/client.
Call me elitist (I don't care) but LLMs are mostly useful to two kinds of people: inexperienced developers, and those that think that hard problems are solved with more code. After almost two decades writing software, I find I need fewer and fewer code to ship a new project, most of my worth is thinking hard away from a keyboard. I really don't see the use of a machine that egregiously and happily writes a ton of code. Less is more, and I appreciate programming-as-an-art rather than being a code monkey paid by the line of code I commit.
Disclaimer: I am anti-LLM by choice so my bias is opposite than most of HN.
This is where both sides are basically just accusing the other of not getting it
The AI coders are saying "These tools are a gamechanger in the hands of skilled developers" implying if you aren't getting gamechanging results you aren't skilled
The non-AI coders are basically saying the same thing back to them. "You only think this is gamechanging because you aren't skilled enough to realize how bad they are"
Personally, I've tried to use LLMs for coding quite a bit and found them really lacking
If people are finding a lot of success with them, either I'm using them wrong and other people have figured out a better way, or their standards are way, way lower than mine, or maybe they wind up spending just as long fixing the broken code as it would take me to write it
What does it add that you couldn't have written yourself faster if you're so skilled?
I think this is needlessly snarky and also presupposes something that wasn't said. No one said it can write something that the developer couldn't write (faster) themselves. Tab complete and refactoring tools in your IDE/editor don't do anything you can't write on your own but it's hard to argue that they don't increase productivity.
I have only used cline for about a week, but honestly I find it useful in a (imo badly organized) codebase at work as an auto-grepper. Just asking it "Where does the check for X take place" where there's tons of inheritance and auto-constructor magic in a codebase I rarely touch, it does a pretty good job of showing me the flow of logic.
Not discounting your experience, but a lot of these examples are about frameworks that never had good bootstrapping, such as Rails does/did. LLMs are really good at boilerplate, but maybe this points to these such stacks needing too much fiddling to get going, vs 10x coder AI.
LLM makes that a bit easier and faster to do, but is also a bit more error prone than a template project.
1. You have to remember all the technologies you need included, my company template already has them.
2. LLM doesn’t have a standardized directory structure, so you end up with different projects having different structures and file naming conventions. This makes later refactoring or upgrades across multiple projects less automatable (sometimes this can be solved by having an LLM do those, but they often are unsuccessful in some projects still)
3. LLMs have a knowledge cutoff. If your company has already moved to a version after that knowledge cutoff, you need to upgrade the LLM generated code.
I very much prefer having a company template to asking an LLM to generate the initial project.
Unless you're a contractor making basic websites for small businesses, how many of these do you need to make? This really a small fraction of the job of most developers, except for entry-level devs.
> when it comes to maintaining existing systems, or adding more complicated features, or needing to know business domain details,
This is what experienced developers will spend 90% of their time doing.
So yes, LLMs can replace entry-level devs, but not the more experienced ones.
This begs the question: if companies stop hiring entry-level devs because LLMs can do their job, how will new devs get experience?
An LLM is very capable of implementing all of that… if only someone who knows all of that stuff tell them first.
And most importantly: LLMs don’t challenge the task given. Engineers do. Many times, problems are solved without code.
Knowing what’s possible, difficult, easy, risky, cheap, expensive, etc.
I mean that in the sense that Excel is the tool that non developers could take the farthest to meet their needs without actually becoming a full time developer. But then to take it past that point a professional developer needs to step in.
I imagine non devs vibe coding their way to solutions far more complex than Excel can handle. But once they get past the greenfield vibe coding stage they will need a pro to maintain it, scale it, secure it, operationalize it, etc.
It can do pieces in isolation, but requires significant handholding on refactors to get things correct (ie, it’s initial version has subtle bugs) — and sometimes requires me to read the docs to find the right function because it hallucinates things from other frameworks will work in its code.
It is something that future versions could fix, if the context a llm can handle grows and also if you could fix it so it could handle debugging itself. Right now it can do it for short burst and it is not bad at it, but it will get distracted quickly and do other things I did not ask for
One of these problems has a technical fix that is only limited by money; the other does not
I just rewrote 300ish advanced PostgreSQL queries to mysql queries. The process is not magical, but it will take me 1 week rather than 3 days. Now I'm on testing phase, seems promising.
The point is if we can find a work to work along with the agent, can be very productive.
Translation from one set of tokens to another is exactly the primary use case of LLMs. This is exactly what it should be good at. Developing new queries, much less so. Translation from one set of database queries to another was already very well defined and well covered before LLMs came about.
I've found great success with LLMs in the research phase of coding. Last week I needed to write some domain-specific linear algebra and because of some other restrictions, I couldn't just pull in LAPACK. So I had to hand code the work (yes, I know you shouldn't hand code this kind of stuff, but it was a small slice and the domain didn't require the fully-optimized LAPACK stuff). I used an LLM to do the research part that I normally would have had to resort to a couple of math texts to fully understand. So in that case it did make me 100x more effective because it found what I needed and summarized it so that I could convert it to code really quickly.
For the fun of it, I did ask the LLM to generate the code for me too, and it made very subtle mistakes that wouldn't have been obvious unless you were already an expert in the field. I could see how a junior engineer would have been impressed by it and probably just check it in and go on.
I'm still a firm believer in understanding every bit of code you check in, so even if LLMs get really good, the "code writing" part of my work probably won't ever get faster. But for figuring out what code to write - I think LLMs will make people much faster. The research and summarize part is amazing.
The real value in the world is synthesis and novel ideas. And maybe I'm a luddite, but I still think that takes human creativity. LLMs will be a critical support structure, but I'm not sold on them actually writing high-value code.
If you’re near that ceiling you get most value out of code reviews and those side features you don’t care about that allow you to focus more on the actual problem.
That’s a 10-20% lift instead of 10/100x.
This is what I've found it most helpful for. Typically I want an example specific to my scenario and use an LLM to generate the scenario that I ask questions about. It helps me go from understanding a process at a high level, to learning more about what components are involved at a lower level which let's me then go do more research on those components elsewhere.
2x may not sound like much compared to what you read in the media, but if a few years ago you had told companies that you can provably make their engineers even 2x more productive on average you'd probably be swimming in billions of dollars now. That kind of productivity increase is absolutely huge.
I understand how to use LLMs, it's just a much worse workflow. Writing code isn't the hard part, reviewing code is much slower and painful process than writing it from scratch, when you actually know what you're doing.
I'm absolutely convinced that people who are saying LLMs are making them more productive either weren't very productive to begin with or they don't know what they're doing enough to realize what a bad job the LLM is doing
LLM are useful to me personally as a learning tool. It’s valuable to have it talk through something or explain why something is designed a certain way. So it’s definitely helped me grow as an engineer. But I have not come close to being 2x as productive, I don’t think my workload would be done by 2 people in the pre-LLM era. Maybe 1.25?
Because for the most part being a computer programmer is not about programming the computer but about… everything else.
Overall, LLM are a net negative for humanity in my opinion. The amount of slop on the internet now is absurd. I can never be sure if I’m reading the writing of a real person or a machine, which makes me feel like I’m slowly becoming brain poisoned by the devil himself on a psychological level. I have to think twice when I read any email longer than 2 paragraphs, or did my coworker start using more em dashes?
And also if it’s so damn easy for a machine to do all of this complex valuable work for a few kilojoules how complex or valuable is it anyway? Maybe it’s all just a fiction, that any of this stuff provides any value to anyone. Is another app really actually necessary? What about a hundred million new apps?! Even if we all became 100x as “productive,” what would that actually mean, in the real world you know the one that’s outside that has smells and textures and objects and people in it talking to one another face to face with their wet vocal chords??
These things are going to drive us all absolutely insane I can guarantee it. The logical end of this is complete madness.
then company decide to hire less, the market flooded with junior AI powered skill, mid level programmer cant asking rise as easy anymore unless you are senior dealing with really specific stuff that need know how
Apparently there are at least three as I fit neither of these molds of yours. They are neither making me 100x more productive, nor am I putting in an extreme amount of coddling.
For context, in the 30ish years I've been writing code professionally, I've *always* needed to look stuff up constantly. I know what I want, and I know what's possible. I just can never remember things like syntax, specifics, that sort of thing. It doesn't help that I often wind up in jobs where I'm switching between languages often.
Before quality search engines like Google, this meant I was always flipping through reference books, man pages, and worse. Then came search engines and that was a step more productive. Then came things like SO, and that was a step more productive. I find LLMs to be another step forward, but not like a quantum leap or anything. Take autocomplete suggestions for instance: Sometimes it's exactly the prompt (pardon the pun) that I need to jog my memory. Other times I know it's not what i want and I ignore it. And the chat interfaces I find better than Googling SO as I can have a back & forth with it.
A few years I did it with something similar to Math Academy (from the Universiteit of Amsterdam). ChatGPT wasn't intelligent enough back then so I didn't use it. It felt a lot tougher.
ChatGPT answers questions that a teacher would find obscure nonsense, but I'm the type of student that needs to know the culture behind math so I can empathize with the people doing it and can think of solutions via that way. Like why is the letter m used when talked about an angle? Mathematically, it shouldn't matter. But it irritates me as I'd use a instead, it makes more sense as you want to lower the cognitive burden and focus as much on the math as possible. So I ask ChatGPT and one of the things it said is that historically it meant "measure". Now, I get it and can focus on the actual math again.
Another example is the fast variance calculation: average_of_squares - square_of_the_average
how does that come from (1/n) * sum((x - mean)^2) ?
It just shows it.
World domination? Nope, but I'm learning something I otherwise wouldn't. But as you can tell, YouTube and Math Academy play their own roll in it.
This isn’t the best example, because a simple Google search gives you immediate results for that as well, and arguably more nuanced ones. You’ll learn, for example, that m is also the standard letter for slope, which may or may not be related.
Also, if a teacher finds that question “obscure nonsense”, I’d argue that they are a bad teacher.
Of course, some of it is a bit strange for modern audience, the way lots of things in calculus are taught nowadays is completely different. If you are looking for rigorous formal proofs, you won't find them on this book.
But it is a quite funny and quirk book that gives you a different perspective on math concepts and will satisfy your hunger for knowing the people and the historical context behind it.
This is a great example of why using LLMs to learn is perilous. This is nonsense.
This is what it wrote by the way.
I posted an image and asked:
Why is this m
m for angle mangle?
ChatGPT:
In geometry we often need to distinguish the angle itself (a geometric object) from the numerical size of that angle (a real-number measurement, usually in degrees or radians). To keep the two ideas separate we put a little “m” in front of the angle symbol:
m∠B reads as “the measure of angle B.”
* ∠B refers to the actual wedge-shaped region determined by the two rays that meet at vertex .
* m∠B is the single number—e.g. \90∘—that tells you how “wide” that wedge is.
So in the worked example the author writes
m∠A + m∠B + m∠C + m∠D = 360∘
because they are adding the sizes of the four interior angles of a quadrilateral, not the angles themselves.
If you dropped the m you’d be adding geometric objects, which doesn’t make sense; the m reminds the reader that what’s being added are real-number measures. (Some textbooks use ∣∠B∣ or simply write the letter with a bar over it, but “m∠” is by far the most common in U.S. and many international curricula.)
So the mysterious “m” isn’t another variable—just a shorthand for “measure of.”
---
Edit: this took quite some work to write well as ChatGPT answers don't copy/paste neatly into HN. So I won't be doing this for the other example.
∠ is traditionally a function from points to axiomatic geometric objects. ∠ABC is the angle at B oriented so that we start at A, go to B, then to C.
Your text seems to be using ∠ either as a kind of type annotation (indicating by ∠B that B is an angle) or (perhaps more likely) is just suppressing the other letters in the triangle and is short for something like ∠ABC.
Since ∠B is an axiomatic Euclidean object, it has no particular relation to the real numbers. m is an operator or function that maps axiomatic angles to real numbers in such a way that the calculations with real numbers provide a model for the Euclidean geometry. Why call it m? I'm not aware of it being historical, but almost certainly it comes from measure, like the μ in measure theory.
Obviously ∠ is a graphical depiction of an angle, and my guess is it probably evolved as a shorthand from the more explicit diagrams in Euclid.
Traditionally angles are named with variables from the beginning of the Greek alphabet: α, β, γ. Then we skip to θ presumably to avoid the Greek letters that look nearly identical to Roman letters.
I conflated this with another ChatGPT conversation where it gave 3 possible historical sources for another symbol that I fell over and then had trouble proceeding.
It isn't customarily used for angles (those get Greek letters).
The m stands for mystery.
Edit: ah, but I see that this prefixed m for "measure" is also used sometimes. It appears at https://en.wikipedia.org/wiki/Angle#Combining_angle_pairs though I'm not sure why it's necessary. Maybe because you want to clarify absolute magnitude is meant, to avoid adding negative values.
It is really hard to follow you if you don't explain yourself.
I'm not saying it's factual. The reason I showed that answer was simply to verify to see if it was what you thought it was (hence I asked "is it?"). It turns out that it wasn't fully.
These type of answers from teachers, co-students, web communities, blogs etc. are – I would assume – why people ask LLMs in the first place.
Could you please assume a good faith discussion?
Most people don't work on anything technically hard, most problems are business logic issues that aren't solved technically or legacy code workarounds for which you need to put 3-10 domain experts in a room for a few hours to solve.
I think it's helpful to think of engineering as a _process_ instead of a role, and the reality is that a lot of development work doesn't necessarily rely on the strong engineering methodology (e.g. measurement, material properties, tolerances, modelling, etc.) that the people developing the software might imagine just based on the number of job adverts for 'engineers'.
This isn't a bad thing. There are hundreds or thousands of different but equally valid solutions to getting a program to do a thing, but not recognising that most code writing is somewhere between a art and engineering and is neither a purely artistic discipline but also rarely a purely engineering one is useful.
The kinds of engineering and architecture that people think of in software development only really represent common practices and shared language (e.g. design patterns, architectural patterns) and not a strong engineering practice or any kind of truth about how software actually runs.
(Unless you're writing software for launching rockets, in which case the engineering _process_ porbably should be strong).
It's probably true that a lot of development work doesn't rely on those. It's probably also true that the work other kinds of engineers do also don't.
That said, when engineering software systems, those are very important. Measurement: resource sizing, observability; tolerances: backoffs, back pressure, queues, topics, buffers; modelling: types, syntax, data analytics...
There's a whole class of developers out there that are not aware or very good at those. And that's fine. There's a place for them in the market. You don't need an engineer to work on your floor joists or your plumbing. Sure you can have one, but you can also hire a builder or DIY it all yourself.
All of theses middling developers who are so excited about these tools don't seem to realize that they are perfectly primed to eventually eliminate precisely the kind of worker responsible for hooking up existing APIs for solved problems into a web app, the kind of work the market hitherto greatly over esteemed and overpaid for isn't my going to be so esteemed or highly valued anymore. The only real work that will remain is actual hard engineering work (solving novel technical modeling problems, not just plugging apis together). All of these lazy devs are hyped for precisely the automation that's goin. to significantly reduce their pay and labor prospects in the long term lol. I'm shocked at times at how people can fail to see the writing on the wall when it's been written in gargantuan red ink
And realistically, there is also the question of where this claim is coming from. Someone being 100 times faster with AI is probably not starting from a place of high competence. Of the 100x, 90x are probably just for filling the gap to peak devs. So at the end they would be probably as productive as a highly competent dev of some domain, just that they can more easily access any domain without years of experience in that domain. And if we are honest, we somewhat have that already without AI, just by people copying and cookie cutting any sh*t
Cursor autocompletes a lot of stuff I would have typed anyway, and LLM search is a strong complement to Google, though not a direct replacement. Generating more than a few lines usually gives me bad code, but often suggests libraries I didn't know about, especially outside my area of expertise.
There's just not a lot of people talking about middling results because the extremes tend to eat up the conversation.
I have gone from Senior II, to Lead Architect, to Principal Engineer in the years LLMs have been available.
It's not literally a 100x multiplier, but I have leveraged their ability to both fill in details where I lack specific knowledge, and as a first point of contact when researching a new space.
LLMs have enabled me to widen my sphere of influence by deepening my abilities at every point of my T-shaped knowledge curve.
I'm finding that these tools are supporting an ever expanding portion of my work, while remaining frustrating frequently.
My game dev is going to be a hybrid approach going forward. I’ll ask it to create a new project with the desired functionality and see how the solution was proposed and then incorporate that into my main code branch.
I find this easier than the Cursor roll back or being overly cautious with suggested edits. I’m just getting to this realization so tbd if it works well. FYI, I’m good at programming generally but game dev is my Achilles. We’ll see how it it goes.
I think something like a graph where nodes are specified clearly with inputs and outputs (or pre- and post-conditions, if you prefer), and agents iteratively generate and refine the implementation details could have some utility, but that would require programmers to know how to think about program design (and have a usable UI/UX for such a tool) instead of hastily stumbling through patches and modifications to existing code they already don’t understand.
The new LLM systems are carrying us farther from that world, not closer to it.
I am very much in that category.
I would describe my experience as the old adage "why keep a dog and bark yourself".
And no, I don't buy into people who say you have to engage in "prompt engineering". That's bullshit. Let's face it "prompt engineering" is a synonym for "I have to waste time coming up with hacky workarounds to make this dumb LLM come up with something usable".
In the end, its always quicker and easier to do it myself. And unless you are fresh out of school, if you've got more than 10 minutes of coding experience, you will always do a better job than the LLM.
Every time a new LLM model comes out, I fall into the same old trap "oh well, this one must be better, surely" I say to myself.
And yet its always the same "you can't polish a turd but you can roll it in glitter" experience.
I'm sick and tired of LLMs hallucinating shit.
I'm sick and tired of LLMs inventing functions that have been available in stdlib for years.
I'm sick and tired of LLMs generating useless code with boilerplate subfunctions that just contain a single commented "implement it yourself" line.
On top of that LLMs are simply terrible for the environment, guzzling up all that electricity, producing nothing but hot air and bullshit results in return.
Basically I'm sick and tired of LLMs and all the LLM fetishists that surround them.
I think it'll calm down a bit. People are just making huge claims right now which is music to the ears of many.
I like the tools don't get me wrong, but they aren't going to make a huge difference day to day. The area they will make a huge difference is in helping people work out issues faster. You know those multi-hour or multi-day issues you run into where you go crazy reading and re-reading the code. The LLMs are good at giving you a fresh look at the code that you'd otherwise need to get more people to help with or go sleep on it for a day.
This time savings here is probably the most significant time savings AI code editors or agents will ever provide.
The result you described is coming soon. CheepCode[0] agents already produce working code in a satisfying percentage of cases, and I am at most 3 months away from it producing end-to-end apps and complex changes that are at least human-quality. It would take way less if I got funded to work on it full time.
Given that I'm this close as a solo founder with no employees, you can imagine what's cooking inside large companies.
[0] My product, cloud-based headless coding agents that connect directly to Linear, accept tickets, and submit GitHub PRs
This is a strong claim. I'm not aware of any AI model or system that can consistently and correctly make complex changes in existing codebases. What makes you so confident that you can achieve this, alone, in the next 3 months?
LLMs have only made this to 100x moderately competent programmers and may be 2xed extremely competent ones?
1. The ones writing Medium posts about being 100x more productive. 2. The skeptics, like yourself, who construct made-up examples to prove it’s impossible to get value, framing things in a way where failure is the only plausible outcome. 3. And then there’s the middle group, people quietly getting real, uneven, but worthwhile value.
You’d be surprised. The vast majority of engineering isn’t that unique. Usefulness often depends on the age of the codebase and the dynamics of the team. I’m firmly in the middle camp: far from perfect, but I get real value from these tools as they stand today. It’s like having a junior engineer who can scaffold out the basics so I can spend my mental energy on the parts that actually matter.
Seriously, why is every comment hyping up AI generated code like this. No concrete evidence, just turtles all the way down. But when actual examples do show up, it's a complete clown show [1][2].
[1]: https://news.ycombinator.com/item?id=44050152
[2]: https://news.ycombinator.com/item?id=43907376
To be fair, I'm not sure the quality of the code is much worse than all the boatloads of crapware in all the app stores out there.
I suspect that the reason "revolutionary products" and "blazing fast iteration" isn't blowing our minds is because the problem was never the code - it was always the people around it, who know better than their customers. The best way to make good software is to test it on real people and iterate - that's going to take a long time no matter how fast the coding part is.
People are building sophisticated products in days not months now. If you can't see that, it's your issue.
Edit: This says a quarter, where did you get the 30% figure from?
https://techcrunch.com/2025/03/06/a-quarter-of-startups-in-y...
Learn some real skills or prepare to not have a job.
For example, using an LLM to help you write a Dockerfile when you write Dockerfiles once a project and don't have a dedicated expert like a Deployment Engineer in your company is fantastic.
Or using an LLM to get answers faster than google for syntax errors and other minor issues is nice.
Even using LLM with careful prompting to discuss architecture tradeoffs and get its analysis (but make the final decision yourself) can be helpful.
Generally, you want to be very careful about how you constrain the LLM through prompts to ensure you keep it on a very narrow path so that it doesn't do something stupid (as LLMs are prone to do), you also often have to iterate because LLMs will occasionally do things like hallucinate APIs that don't actually exist. But even with iteration it can often make you faster.
This made me think of gell-mann amnesia: https://en.m.wikipedia.org/wiki/Gell-Mann_amnesia_effect
Basically, when you understand the problem domain, you know the llm is generating crap that, as the OP says, you're unwilling to put your name on. But when it's in a domain you don't know, you forget about how crappy the LLM is normally and go with the vibe.
I find it's the best for that but once things get complicated, you're just arguing with the LLM, when you should have done it yourself.
I must say, the solution was very creative. It involved using powershell to script autohotkey to open photoshop and use AHK to automate exporting each image as a webp, closing photoshop after exporting each image. I don't have a photoshop liscence, and I don't know why powershell would be needed to script another scripting tool. I also would suspect that photoshop or another Adobe tool probably has a bulk converter. But I do need to give points for originality, that is exactly the type of harebrained solution I would sarcastically suggest if I was wanting to mildly troll a coworker.
https://imagemagick.org/script/command-line-tools.php
The "solution" stated is hilarious.
I use a wide variety of tools. For more private or personal tasks, I mostly rely on Claude and OpenAI; sometimes I also use Google or Perplexity—whichever gives the best results. For business purposes, I either use Copilot within VSCode or, via an internal corporate platform, Claude, OpenAI, and Google. I’ve also experimented a bit with Copilot Studio.
I’ve been working like this for about a year and a half now, though I haven’t had access to every tool the entire time.
So far, I can say this:
Yes, LLMs have increased my productivity. I’m experimenting with different programming languages, which is quite fun. I’m gaining a better understanding of various topics, and that definitely makes some things easier.
But—regardless of the model or its version—I also find myself getting really, really frustrated. The more complex the task, the more I step outside of well-trodden paths, and the more it's not just about piecing together simple components… the more they all tend to fail. And if that’s not enough: in some cases, I’d even say it takes more time to fix the mess an LLM makes than it ever saved me in the first place.
Right now, my honest conclusion is this: LLMs are useful for small code completion tasks, troubleshooting and explaining —but that’s about it. They’re not taking our jobs anytime soon.
> - Cursor (can't remember which model, the default)
> - Google's Jules
> - OpenAI Codex with o4
Cursor's "default model" rarely works for me. You have to choose one of the models yourself. Sonnet 4, Gemini 2.5 Pro, and for tricky problems, o3.
There is no public release of o4; you used o4-mini, a model with poorer performance than any of the frontier models (Sonnet 4, Gemini Pro 2.5, o3).
Jules and Codex, if they're like Claude Code, do not work well with "Build me a Facebook clone"-type instructions. You have to break everything down and make your own tech stack decisions, even if you use these tools to do so. Yes they are not perfect and make regressions or forget to run linters or check their work with the compiler, but they do work extremely well if you learn to use them, just like any other tool. They are not yet magic that works without you having to put in any effort to learn them.
I reflected once that very little of my time as a senior engineer is actually spent just banging out code. The actual writing of the code is never the hard part or time-consuming part for me - it's figuring out the right architecture, figuring out how to properly refactor someone else's hairball, finding performance issues, debugging rare bugs, etc. Yes, LLMs accelerate the process of writing the boilerplate, but unless you're building brand new products from scratch every 2nd week, how much boilerplate are you really writing? If the answer is "a lot", you might consider how to solve that problem without relying on LLMs!
LLMs are better at reading terrible documentation than the average programmer. So in that sense, "obscure text reading and explain it better" there seems to be a clear value add.
> If the answer is "a lot", you might consider how to solve that problem without relying on LLMs!
Aren't there languages with a lot of boilerplate though?
LLMs don't go and read the terrible documentation for you when prompted. They reproduce the information posted by other people that struggled with said terrible documentation, if it was posted somewhere.
It's still better than a modern web search or struggling with the terrible documentation on your own - for introductory stuff.
For going into production you have to review the output, and reading code has always been harder than writing it...
The limits of what can be automated have been clear for almost a century now. Yet people insist in selling ways to automate your way out of social, political, economical problems. Come to think about it, people should also read Comways's paper as well.
It is amazing how in our field we repeatedly forget this simple advice from Fred Brooks.
In my experience, LLMs are way more useful for coding and less problem-prone when you use them without exaggerated expectations and understand that it was trained on buggy code, and that of course it is going to generate buggy code. Because almost all code is buggy.
Don't delegate design for it, use functional decomposition, do your homework and then use LLMs to eliminate toil, to deal with the boring stuff, to guide you on unfamiliar territory. But LLMs don't eliminate the need for you to understand the code that goes with your name. And usually, if you think a piece of LLM generated code is perfect, remember that maybe the defects are there, but you need to improve your own knowledge and skills to find it. Be always suspicious, don't trust it blindly.
As you say, it's great for automating away boring things; as a more complicated search & replace, for instance. Or, "Implement methods so that it satisfies this interface", where the methods are pretty obvious. Or even "Fill out stub CRUD operations for this set of resources in the API".
I've recently started asking Claude Opus 4 to review my patches when I'm done, and it's occasionally caught errors, and sometimes has been good at prompting me to do something I know I really should be doing.
But once you get past a certain complexity level -- which isn't really that far - it just stops being useful.
For one thing, the changes which need to be made often span multiple files, each of which is fairly large; so I try to think carefully about which files would need to be touched to make a change; after which point I find I have an idea what needs to be changed anyway.
That said, using the AI like a "rubber duck" programmer isn't necessarily bad. Basically, I ask it to make a change; if it makes it and it's good, great! If it's a bit random, I just take over and do it myself. I've only wasted the time of reviewing the LLM's very first change, as nearly everything else I'd've had to do if I wrote the patch myself from scratch anyway.
Furthermore, I often find it much easier to take a framework that's mostly in the right direction and modify it the way that I want, than to code up everything from scratch. So if I say, "Implement this", and then end up modifying nearly everything, it still seems like less effort than starting from scratch myself.
The key thing is that I don't work hard at trying to make the LLM do something it's clearly having trouble with. Sometimes the specification was unclear and it made a reasonable assumption; but if I tell it to do something and it's still having trouble, I just finish the task myself.
- I find it is pretty good at making fairly self-contained react components or even pages especially if you are using a popular UI library
- It is pretty reliable at making well-defined pure functions and I find it easier to validate that these are correct
- It can be good for boilerplate in popular frameworks
I sometimes feel like I am losing my mind because people report these super powerful end to end experiences and I have yet to see anything close in my day to day usage despite really trying. I find it completely falls over on a complete feature. I tried using aider and people seem to love it but it was just a disaster for me. I wanted to implement a fairly simple templated email feature in a Next.js app. The kind of thing that would take me about a day. This is one of the most typical development scenarios I can imagine. I described the feature in it's entirety and aider completely failed, not even close. So I started describing sub-features one by one and it seemed to work better. But as I added more and more, existing parts began to break, I explained the issues to aider and it just got worse and worse with every prompt. I tried to fix it manually but the code was a mess.
I've been writing code for 36 years, so I don't take any of the criticism to heart. If you know what you are doing, you can ship production quality code written by an LLM. I'm not going to label it "made by an AI!" because the consumer doesn't care so long as it works and who needs the "never AI!" backlash anyway?
But to the OP: your standards are too high. AI is like working with a bright intern, they are not going to do everything exactly the way that you prefer, but they are enthusiastic and can take direction. Choose your battles and focus on making the code maintainable in the long term, not perfect in the short term.
Sure, vibe coders by definition can't have any standards for the code they're generating because by definition they never look at it.
> Is the model for success here that you just say "I don't care about code quality because I don't have to maintain it because I will use LLMs for that too?"
Vibe coding may work for some purposes, but if it were currently a successful strategy in all cases, or even narrowly for improving AI, Google AI or DeepSeek or somebody would be improving their product far faster than mere humans could, by virtue of having more budget for GPUs and TPUs than you do, and more advanced AI models, too. If and when this happens you should not expect to find out by your job getting easier; rather, you'll be watching the news and extremely unexpected things will be happening. You won't find out that they were caused by AI until later, if ever.
Fast food, assembly line, factory may be examples, but there is a HUGE catch: When a machine with a good setup makes your burger, car or wristwatch, you can be sure that at 99.99% it is as specified. You trust the machine.
With LLMs, you have to verify each single step, and if you don't, it simply doesn't work. You cannot trust them to work autonomously 24/7.
That's why you ain't losing your job, yet.
What LLMs are good at and their main value I'd argue, is nudging you along and removing the need to implement things that "just take time".
Like some days back I needed to construct a string with some information for a log entry, and the LLM that we have suggested a solution that was both elegant and provided a nicer formatted string than what I had in mind. Instead of spending 10-15 minutes on it, I spent 30 seconds and got something that was nicer than what I would have done.
It's these little things that add up and create value, in my opinion.
I was still learning Java at uni (while being a Python/Lisp fanboy) when I realised this:
- Complex and wordy languages need tooling (like autocomplete, autoformatting) to handle the tedious parts.
- Simple and expressive languages can get away with coding in notepad.exe.
- Lisp, as simple and powerful as it is, needs brace highlighting. You. Simply. Just. Can't.
Now 10, 20 years later you can look back at the evolution of many of these languages; some trends I've observed:
- Java, C#, C++, have all borrowed a lot from functional languages.
- JVM has Clojure.
- Go stubbornly insists on "if err != nil" - which isn't even the worst part; the following fmt.Errorf is.
- Rust (post-1.0) cautiously moved towards "?"; Zig (still pre-1.0) also has specific syntax for errors.
- Python is slowly getting more verbose, mostly because of type annotations.
- Autoformatters are actually great, you don't even have to care about indenting code as you spit it out, but... Python, being whitespace-sensitive, makes them a bit less useful.
Good tooling helps you with wordy languages. Expressive languages help you write concise code. Code is read much more often than it's written. Deterministic tooling can work with the structure of the code, but LMs (being probabilistic) can help you work with its intent. Language models are an evolution of automated tooling - they will get better, just like autocomplete got better; but they will never "solve" coding.
In my opinion, there's no truth here, only opinions.
It finally clicked for me when I tried Gemini and ChatGPT side by side. I found that my style of working is more iterative than starting with a fully formed plan. Gemini did well on oneshots, but my lack of experience made the output messy. This made it clear to me that the more chatty ChatGPT was working for me since it seems to incorporate new stuff better. Great for those "Oh, crap I didn't think of that" moments that come up for inexperienced devs like me.
With ChatGPT I use a modular approach. I first plan a high level concept with 03, then we consider best practices for each. After that I get best results with 4o and Canvas since that model doesn't seem to overthink and change direction as much. Granted, my creations are not pushing up against the limits of human knowledge, but I consistently get clean maintainable results this way.
Recently I made a browser extension to show me local times when I hover over text on a website that shows an international time. It uses regex to find the text, and I would never have been able to crank this out myself without spending considerable time learning it.
This weekend I made a Linux app to help rice a spare monitor so it shows scrolling cheat sheets to help me memorize stuff. This turned out so well, that I might put it up on GitHub.
For dilettantes like me this opens up a whole new world of fun and possibilities.
However, only maybe 10% of that is agentic coding. Thus, my recommendation would be - try non-agentic tools.
My primary workflow is something that works with the Zed editor, and which I later ported as a custom plugin to Goland. Basically, you first chat with the AI in a sidebar possibly embedding a couple of files in the discussion (so far nothing new), and then (this is the new part) you use contextual inline edits to rewrite code "surgically".
Importantly, the inline edits have to be contextual, they need to know both the content of the edited file, and of the conversation so far, so they will usually just have a prompt like "implement what we discussed". From all I know, only Zed's AI assistant supports this.
With this I've had a lot of success. I still effectively make all architectural decisions, it just handles the nitty-gritty details, and with enough context in the chat from the current codebase (in my case usually tens of thousands of tokens worth of embedded files) it will also adhere very well to your code-style.
You mean the session context awareness? I thought it is a default in all major IDE/plugins. Or you mean some specific trait of that feature?
Great for annoying ad-hoc programming where the objective is clear but I lack the time or motivation to do it.
Example: After benchmarking an application on various combinations of OS/arch platforms, I wanted to turn the barely structured notes into nice graphs. Claude Code easily generated Python code that used a cursed regex parser to extract the raw data and turned it into a bunch of grouped bar charts via matplotlib. Took just a couple minutes and it didn't make a single mistake. Fantastic time saver!
This is just an ad-hoc script. No need to extend or maintain it for eternity. It has served its purpose and if the input data will change, I can just throw it away and generate a new script. But if Claude hadn't done it, the graphs simply wouldn't exist.
Update: Sorry, missed "writing self-contained throwaway pieces of code"... well for core development I too haven't really used it.
I’ve had great success using LLMs for things that I haven’t done in a while or never before. They allow me to build without getting too bogged down into the details of syntax
Yes, they require constant attention, they are not fully independent or magical. And if you are building a project for the longer run, LLM-driven coding slows down a lot once the code base grows beyond just a couple of basic files (or when your files start getting to about 500-800+ lines)
I’ve tried several agentic editors and tools, including cursor, they can def be helpful, but I’d rather just manually loop between ChatGPT (o4-high-mini for the most part) and the editor. I get a very quick and tight feedback loop in which I get plenty of control
Git is essential for tracking changes, and tests are gold once you are at a certain size
> How I program with LLMs
— https://news.ycombinator.com/item?id=42617645
I find them to be super useful for things that I don't already know how to do, e.g. a framework or library that I'm not familiar with. It can then give me approximate code that I will probably need to modify a fair bit, but that I can use as the basis for my work. Having an LLM code a preliminary solution is often more efficient than jumping to reading the docs immediately. I do usually need to read the docs, but by the time I look at them, I already know what I need to look up and have a feasible approach in my head.
If I know exactly how I would build something, an LLM isn't as useful, although I will admit that sometimes an LLM will come up with a clever algorithm that I wouldn't have thought up on my own.
I think that, for everyone who has been an engineer for some time, we already have a way that we write code, and LLMs are a departure. I find that I need to force myself to try them for a variety of different tasks. Over time, I understand them better and become better at integrating them into my workflows.
Aider, in my humble opinion, has some issues with its loop. It sometimes works much better just to head over to AI studio and copy and paste. Sometimes it feels like aider tries to get things done as cheaply as possible, and the AI ends up making the same mistakes over again instead of asking for more information or more context.
But it is a tool and I view it as my job to get used to the limitations and strengths of the tool. So I see my role as adapting to a useful but quirky coworker so I can focus my energy where I'm most useful.
It may help that I'm a parent of intelligent and curious little kids. So I'm used to working with smart people who aren't very experienced and I'm patient about the long term payoff of working at their level.
Particularly, with data structures it is garbage: it nevers understands the constrains that justify writing a new one instead of relying on the ones from the standard library.
And finally, it is incapable of understanding changes of mind. It will go back to stufd already discarted or replaced.
The worst part of all is that it insists in introducing its own "contributions". For example, recently I have been doing some work on ML and I wanted to see the effect of some ablations. It destroyed my code to add again all the stuff I had removed on purpose.
Overall, it provides small typing/search savings, but it cannot be trusted at all yet.
What works for me is collecting it manually and going one implementation chunk at a time. If it fails, I either do it myself or break it down into smaller chunks. As models got better these chunks got larger and larger.
Collecting context manually forces me to really consider what information is necessary to solve the problem, and it's much easier to then jump in to fix issues or break it down compared to sending it off blind. It also makes it a lot faster, since I shortcircuit the context collection step and it's easier to course-correct it.
Collecting manually is about 10 seconds of work as I have an extension that copies all files I have opened to the clipboard.
In Plato's Republic Socrates' compares the ability to produce a piece of furniture with the ability to produce the image of a cabient or so-forth with a small compact mirror; what is the difference if a deceivable crowd doesn't know the difference?
It’s amazing. Better design in terms of UI / UX than I could have fathomed and so much more.
There’s a lot of duplicated code that I’ll clean up, but the site functions and will be launched for clients to start using soon.
For my day job, it’s also helping me build out the software at a faster pace than before and is an amazing rubber duck.
Also as I understand one the main problem with LLM right now is trying to apply some surgical changes to large enough code base or adding some extra functionality without breaking/altering existing ones, have you faces issues with that?
Another issue is security, I've heard some horror stories of non-tech people developing web solutions to later find them destroyed by hackers because they didn't know were to look to find holes in their design.
And as for security., were they even caring about it? Were they asking good questions? Reading about things? Learning ways to secure things?
If someone blindly follows what it gives you, something interesting will happen at some point. Use it as an enhancer.
> Am I just not using the tools correctly?
No, there is no secret sauce and no secret prompting. If LLMs were capable, we'll see lots of new software generated by it given how fast LLMs are at writing code. Theoretically, assuming a conservative 10token/s speed and a 100M token for Chromium code base, you could write a new browser with LLMs in only 115 days.
No comments yet
Firstly, there absolutely are people popping up in certain domains with LLM assisted developed products that could not have managed it otherwise, with results you would not suspect were made that way if you were not told.
However, I share the same problem myself. The root of it is "analysis is harder than synthesis". i.e. if you have to be sure of the correctness of the code it's far easier to write it yourself than establish that an LLM got it right. This probably means needing to change how to split things out to LLMs in ways human co-workers would find intolerable.
But just wait for the next doubling of long task capacity (https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...). Or the doubling after that. AI will get there.
Write tests for x in the style of this file to cover a, b, c.
Help me find a bug here within this pseudo code that covers three classes and a few functions. Here's the behavior I see, here's what I think could be happening.
I rarely give it access to all the code and usually give it small portions of code and ask for small things. I basically treat it as if I was reaching out to another senior developer in a large company or SO. They don't care to learn about all the details that don't matter, and want a good promoted question that's not wasting their time and that they can help with.
Using it this way I absolutely see the benefits and I'd say an arbitrary 1.25x sounds right (and I'm an experienced engineer in my field).
I'll just quietly keep using it this way and ignore the overwhelming hype on both sides (it's not a speed up camp and it's 100x camp. Imo both are wrong but the it's not a speed up camp make me question how they're using it the most
No comments yet
It started out promising, renaming the symbols according to my instructions. Slower than if I had done it myself, but not horrible slow. It skipped over a few renames so I did them manually. I had to tell it to continue every 2 minutes so I could not really do anything else in the meantime.
I figured it’s quicker if I find the files in question (simple ripgrep search) and feed them to copilot. So I don’t have to wait for it to search all files.
Cool, now it started to rename random other things and ignored the naming scheme I taught it before. It took quite some time to manually fix its mess.
Maybe I should have just asked it to write a quick script to do the rename in an automated way instead :)
1. Use a high quality model with big context windows via API (I recommend Openrouter). E.g. Google Gemini 2.5 Pro is one of the best which keeps constant good quality (OpenAI reasoning models can be better in problem solving but it's kinda a mixed bag). Other people swear by the Claude Sonnet models.
2. Upgrade your code tools you combine with this high quality models. Google Jules and OpenAI Codex are so brand new and have a totally different aim than Cursor. Don't use them (yet). Maybe they will get good enough in future. I would focus on established tools like aider (steepest learning curve), roo code (easier) to be paired with Openrouter and if you want to have it really easy claude code (only useful with a 100-200 USD Anthropic subscription IMHO). On average you will get better results with Aider, roo, claude code than with Cursor or Windsurf.
Btw. I think Cursor and Windsurf are great as a starter because you buy a subscription with 15-20 USD and are set. It can be most likely that the more high quality tools burn more tokens and you spent more per month but you also get better quality back in return.
Last but not least and can be applied to every coding assistant: Improve your coding prompts (be more specific in regards to files or sources), do smaller and more iterations until reaching your final result.
When I start to stray into even moderately complex work, LLMs become pretty useless quick. Tell me your setup, and I will give a quick sample task that it will fail at. Stop the fanboyism please
The LLMs advance and so does the quality. What e.g. OpenAIs o3 can do is so much better than what GPT 3.5 can do.
I'm convinced every developer who does not stay up to date with AI coding tools and know how to use them will become obsolete sooner or later.
The usual word is "drivel" rather than "dribble".
Op said literally "Am I just not using the tools correctly?" and I said for the 2 of 3 tools OP tried are so new and experimental I would put them away (for now).
And I use a wide range of model/versions.I mostly use Claude, from 3.5 to the newest 4.0. I also use Gemini and with Copilot.
Roo code is smarter with this problem, Claude Code is also pretty smart with it and in aider the problem does not exist if you as a human understand the project structure and tell aider the important context files to send to the AI.
As a developer buddy - no. LLMs don't actually think and don't actually learn like people do. That part of the overinflated expectations is gonna hit hard some companies one day.
I have collected similar requests over time and I don't have to remind GH copilot/Claude as much anymore.
As the project becomes non-trivial (>1000 lines), they get increasingly likely to get confused. They can still seem helpful, but they may be confidently incorrect. This makes checking their outputs harder. Eventually silly bugs slip through, cost me more time than all of the time LLMs saved previously.
Also a lot of marketing and it's cool to hype LLMs and I guess people like to see content about what it can do in YouTube and Instagram.
But then resources became cheap and it stoped matter. Yeah, the tight well designed machine code is still some sort of art expression but for practical purpose it makes sense to write a program in higher level language and waste a few MB...
But you trust a C compiler, or a Python intepreter, to do their job in a deterministic way. You will never be able to trust Copilot telling you that "this should be the code you are using".
It may suggest you using AWS, or Google, or Microsoft, or Tencent infrastructure. An LLM can even push you a specific style, or political agenda, without even you realizing it.
I hate polarized discussion all-or-nothing thinking about LLMs. See how perfectly and reliably they can translate text in whatever language. See them fail at aligning a table with a monospace font.
An llm project that can be generated from scratch every time is maybe understandable if you use very good prompts and a lot of grounding text. It is not deterministic unless you use zero temperature and stick with the same model forever. Something that's impossible now. Six months ago the state of the art model was deepseek r1.
- Using an AI for strange tasks like using a TTS model to turn snippets of IPA text (for a constructed language) into an audio file (via CLI) - much of the task turned out to be setting up stuff. Gemini was not very good when it came to giving me instructions for doing things in the GCP and Google Workspace browser consoles. ChatGPT was much clearer with instructions for setting up AWS CLI locally and navigating the AWS browser console to create dedicated user for the task etc. The final audio results were mixed, but then that's what you get when trying to beat a commercial TTS AI to doing something it really thinks you're mad to try.
- Working with ChatGPT to interrogate a Javascript library to produce a markdown file summarising the library's functionality and usage, to save me the time of repeating the exercise with LLMs during future sessions. Sadly the exercise didn't help solve the truly useless code LLMs generate when using the library ... but it's a start.
- LLMs are surprisingly good at massaging my ego - once I learned how to first instruct them to take on a given persona before performing a task: <As an English literature academic, analyse the following poem: title: Tournesols; epigraph: (After "Six Sunflowers, 1888" by van Gogh / felled by bombs in 1945); text: This presented image, dead as the hand that / drew it, an echo blown to my time yet // flames erupt from each auburn wheel - / they lick at the air, the desk: sinews // of heat shovelled on cloth. Leaves / jag and drop to touch green glaze - // I want to tooth those dry seeds, sat / by the window caught on the pot's neck // and swallow sunshine. So strong / that lost paint of the hidden man.>
I still fear LLMs, but now I fear them a little less ...
I find that so far their quality is horizontal, not vertical.
A project that involves small depth across 5 languages/silos? Extremely useful.
A long project in a single language? Nearly useless.
I feel like its token memory. And I also feel like the solution will be deeper code modularisation.
Now with webui what's important is to constantly add tests around the code base, also if it gets stuck, go through the logs and understand why.
It's more of a management role of ,,unblocking'' the LLM if it gets stuck and working with it than fitting it to my previous workflow.
What matters: - the model -> choose the SOTA (currently Claude 4 Opus). I use it mostly in Cursor. - the prompt: give it enough context to go by, reference files (especially the ones where it can start delving deeper from), be very clear in your intentions. Do bullet points.
- for a complex problem: ask it to break down its plan for you first. Then have a look to make sure it’s ok. If you need to change anything in your plan, now’s the time. Only then ask it to build the code.
- be patient: SOTA models currently aren’t very fast
I work at a company with millions of MAU, as well as do side projects - for the company, I do spend a bit more time checking and cleaning code, but lately with the new models less and less.
For my side projects, I just bang through with the flow above.
Good luck!
Most of the time, the output isn't perfect, but it's good enough to keep moving forward. And since I’ve already written most of the code, Jules tends to follow my style. The final result isn’t just 100%, it’s more like 120%. Because of those little refactors and improvements I’d probably be too lazy to do if I were writing everything myself.
On the other side, getting a good flow is not trivial. I had to tweak rules, how to describe problem, how to plan the work and how to ask the agent. It takes time to be productive.
Eg. Asking agent to create a script to do string manipulation is better than asking them to do inplace edit. As it's easier to debug and repeat.
My belief is that true utility will make itself apparent and won't have to be forced. The usages of LLMs that provide immense utility have already spread across most the industry.
Also, the big difference with this tool is that you spend more time planning, don't expect it to 1 shot, you need to think about how you go from epic to task first, THEN you let it execute.
I don't suppose there's any solution where you can somehow further train a LLM on your code base to make it become part of the neural net and not part of the prompt?
This could be useful on a large ish code base for helping with onboarding at the least.
Of course you'd have to do both the running and training locally, so there's no incentive for the LLM peddlers to offer that...
On a large code base it would probably work better if you didn't need to put it all in the context... if you even could...
No comments yet
I'm not building commercial software and don't have a commercial job at the moment so I'm kind of struggling with credits otherwise I would probably blow 40-100$ dollars a day.
It writes JUnit tests with mocks, chooses to run them, and fixes the test or sometimes my (actually broken) code.
It’s not helpful for 90% of my work, but like having a hammer, it’s good to have when you know that you have a nail.
Getting the best possible results requires: - an LLM trained to have the "features" (in the ML/DL sense of the word) required to follow instructions to complete your task - an application that manages the context window of the LLM - strategically stuffing the context window with preferences/conventions, design information, documentation, examples, your git repo's repo map, and make sure you actually use rules and conventions files for projects. Do not assume the LLM will be able to retrieve or conjure up all of that for you. Treat it like a junior dev and lead it down the path you want it to take. It is true, there is a bit of micromanagement required but Aider makes that very very simple to do. Aider even makes it possible to scrape a docs page to markdown for use by the LLM. Hooking up an LLM to search is a great way to stuff the context window BTW, makes things much simpler. You can use the Perplexity API with Aider and quickly write project plans and fetch necessary docs quickly this way; just turn that into markdown files you'll load up later after you switch models to a proper code gen model. Assume that you may end up editing some code yourself, Aider makes launching your editor easy though.
This mostly just works. For fun the first thing I did with Aider was to write a TUI chat interface for ollama and I had something I could post to github in about an hour or two.
I really think Aider is the missing ingredient for most people. I have used it to generate documentation for projects I wrote by hand, I have used it to generate code (in one of my choice languages) for projects written in a language I didn't like. It's my new favorite video game.
Join the Aider discord, read the docs, and start using it with Gemini and Sonnet. If you want local, there's more to that than what I'm willing to type in a comment here but long story short you also need to make a series of correct decisions to get good results from local but I do it on my RTX4090 just fine.
I am not a contributor or author of Aider, I'm just a fanatical user and devotee to its way of doing things.
And given the output I’ve seen when I’ve tried to make it do more, I seriously doubt any of this magic generated software actually works.
They're good at coming up with new code.
Give it function signature with types and it will give pretty good implementation.
Tell it to edit something, and it will lose track.
The write-lint-fix workflow with LLMs doesn't work for me - LLM is monkey brain edits unrelated parts of code.
You are missing a crucial part of the process - writing rules
I'm not yet a fan of Windsurf or Cursor, but honestly Roo Codes out of the box personas for architect, and orchestration to spin up focused subtasks works well for me.
I am kinda treating it how I would a junior, to guide it there, give it enough information to do the work, and check it afterwards, ensuring it didn't do things like BS test coverage or write useless tests / code.
It works pretty well for me, and I've been treating prompting these bots just as a skill I improve as I go along.
Frankly it saves me a lot of time, I knocked out some work Friday afternoon that I'd estimate was probably 5pts of effort in 3 hours. I'll take the efficiency anyday as I've had less actual coding focus time in coding implementations than I used to in my career due to other responsibilities.
So why are you complaining? I use AI all the time to give me suggestions and ideas. But I write the perfect code myself.
but I think this is solvable when context length goes way higher than current length
TLDR; it works for a codebase of 1M LoC. AI writes code a lot faster, completing tasks in days instead of sprints. Tasks can be parallelized. People code less, but they need to think more often.
(1) Maintain clear and structured architecture documentation (README, DDD context/module descriptions files, AGENTS-MD).
(2) Create detailed implementation plans first - explicitly mapping dependencies, tests, and potential challenges.
(3) Treat the implementation plan as a single source of truth until execution finishes. Review it manually and with LLM-assistance to detect logical inconsistencies. Plan is easier to change, than a scattered diff.
(4) In complex cases - instruct AI agents about relevant documents and contexts before starting tasks.
(5) Approve implementation plans before allowing AI to write code
(6) Results are better if code agent can launch automated full-stack tests and review their outputs in the process.
The same works for me in smaller projects. Less ceremony is needed there.
https://newsroom.ibm.com/2025-05-06-ibm-study-ceos-double-do...
That’s immensely valuable and pretty game changing
I tried to use many LLM tools. They are generally not capable of doing anything useful in a real project.
Maybe solutions like MCP, that allow the LLM to access the git history make the LLM become useful for someone that actually works on a project.
> I cannot see the Create Project button as user project-admin@domain.com. Please use Supabase MCP to see if I have the correct permissions, if so, are we handling it correctly in the UI?
Core Development Capabilities:
- File Discovery & Navigation: file_explorer with pattern matching and recursive search
- Intelligent Code Search: search_in_file_fuzzy with similarity thresholds for finding relevant code sections
- Advanced Code Editing: file_diff_writer with fuzzy matching that can handle code changes even after refactoring
- Backups: backup and restores of any files at any state of change.
- System Monitoring: Real-time log analysis and container management
- Hot Deployment: docker_rebuild for instant container updates (Claude can do the rebuild)
The Agentic Workflow:
- Claude searches your codebase to understand current implementation
- Uses fuzzy search to find related code patterns and dependencies
- Makes intelligent edits using fuzzy replacement (handles formatting changes)
- Monitors logs to verify changes work correctly
- Restarts containers as needed for testing
- Iterates based on log feedback
- Error handling requires analyzing logs and adjusting parsing strategies
- Performance tuning benefits from quick deploy-test-analyze cycles
I've not had any issues with Claude being able to handle changes, even doing things like refactoring overly large HTML files with inline CSS and JS. Had it move all that to a more manageable layout and helped out by deleting large blocks when necessary.
The fuzzy matching engine is the heart of the system. It uses several different strategies working in harmony. First, it tries exact matching, which is straightforward. If that fails, it normalizes whitespace by collapsing multiple spaces, removing trailing whitespace, and standardizing line breaks, then attempts to match again. This handles cases where code has been reformatted but remains functionally identical.
When dealing with multi-line code blocks, the system gets particularly clever. It breaks both the search text and the target content into individual lines, then calculates similarity scores for each line pair. If the average similarity across all lines exceeds the threshold, it considers it a match. This allows it to find code blocks even when individual lines have been slightly modified, variable names changed, or indentation adjusted.
Why don’t you consider that the AI will be the one maintaining it?
I write compilers. Good luck getting an LLM to be helpful in that domain. It can be helpful to break down the docs for something like LLVM but not for writing passes or codegen etc
I think it’s great for coders of all levels, but jr programmers will get lost once the llm inevitably hallucinates and the expert will get gains, but not like those who are in the middle.
1. Improve the code that I already have. Waste of time, it never works. This is not because my code is too good, but because it is SQL with complex context and I get more hallucinations than usable code, still the usable code is good for basic tasks and not for more.
2. Areas I rarely use and I don't maintain an expertise on. This is where it is good value, I get 80% of what I need in 20% of the time, I take it and complete the work. But this does not happen too often, so the overall value is not there yet.
In a way is like RPA: it does something, not great, but it saves some time.
AI tools can do things faster, but at lower quality. They can't do the _same_ thing faster.
So AI is fine for specifically low quality, simple things. But for anything that requires any level of customization or novelty (which is most software), it's useless for me.
I expect they soon will be able to help me with basic refactoring that needs to be performed across a code base. Luckily my code uses strong types: type safety quickly shows where the LLM was tripping/forgetting.
So the question really comes down to what kind of project you are developing:
Get an MVP fast? LLM is great!
Proof of Concept? LLM rules!
Big/Complex project? LLM junior developer is not up to the task.
Perhaps one day I'll 'incorporate myself' and start posting my solutions and perhaps make some dough.. but the I benefit far more than the $20 a month I am paying.
The right 'prompt' (with plenty of specs and controls) saves me from the (classic!) swing-on-tree example: https://fersys.cloud/wp-content/uploads/2023/02/4.jpg
I am using:
And with Vertex: And even: And with Vertex: Neither of them is reliable.It always gives me json in the format of a markup document with a single json code block:
Sure I can strip the code fence, but it's mighty suspicious I asked for json and got markup.I am getting a huge number of json syntax errors, so it's not even getting to the schemas.
When i did get to the schemas, it was occasionally leaving out fields that I'd declared were required (even if i.e. null or an empty array). So I had to mark them as not required, since the strict schema wasn't guiding it to produce correct output, just catching it when it did.
I admit I'm challenging it by asking it to produce json that contains big strings of markup, which might even contain code blocks with nested json.
If that's a problem, I'll refactor how I send it prompts so it doesn't nest different types.
But that's not easy or efficient, because I need it to return both json and markup in one call, so if I want to use "responseMimeType": "application/json" and "responseSchema", then it can ONLY be json, and the markup NEEDS to be embedded in the json, not the other way around, and there's no way to return both while still getting json and schema validation. I'd hate to have to use tool calls as "out parameters".
But I'm still getting a lot of json parsing problems and schema validation problems that aren't related to nested json formatting.
Are other people regularly seeing markup json code blocks around what's supposed to be pure json, and getting a lot of json parsing and schema validation issues?
Otherwise there are libraries like instructor (python) which can help to generate json in markdown code blocks.
i speak in thoughts in my head and it is better to just translate those thoughts to code directly.
putting them into a language for LLMs to make sense and understanding the output is oof... too much overhead. and yeah the micromanagement, correcting mistakes, miscommunications, its shit
i just code like the old days and if i need any assistance, i use chatgpt
TLDR, to use those tools effectively you need to change yourself a bit but in a fairly good direction.