Show HN: Eyesite – Experimental website combining computer vision and web design (blog.andykhau.com)

If I understand the problem well enough, and have a really good description of what I want, like I'm explaining it to a junior engineer, then they do an OK job at it.

At my last job, we had a coding "challenge" as part of the interview process, and there was a really good readme that described the problem, the task, and the goal, which we gave the candidate at the start of the session. I copy/pasted that readme into copilot, and it did as good a job as any candidate we'd ever interviewed, and it only took a few minutes.

But whenever there are any unknowns or vagaries in the task, or I'm exploring a new concept, I find the AIs to be more of a hindrance. They can sometimes get something working, but not very well, or the code they generate is misleading or takes me down a blind path.

The thing for me, though, is I find writing a task for a junior engineer to understand to be harder than just doing the task myself. That's not the point of that exercise, though, since my goal is to onboard and teach the engineer how to do it, so they can accomplish it with less hand-holding in the future, and eventually become a productive member of the team. That temporary increase in my work is worth it for the future.

With the AI, though, its not going to learn to be better, I'm not teaching it anything. Every time I want to leverage it, I have to go through the harder tasks of clearly defining the problem and the goal for it.

rsynnott · 32d ago

Only really terrible junior engineers will _pretend they know how to do the thing and just produce nonsense_, tho.

insane_dreamer · 32d ago

This is the key problem. The LLM won't ask questions or clarify something that it doesn't understand; it'll just proceed with what it thinks it knows, and more often than not, get it wrong.

It does usually summarize what you want, but that's simply a restatement of the prompt (sometimes verbatim), which is not the same as the type of follow-up questions that a good Jr engineer would make.

cadamsdotcom · 32d ago

That’s what it does unprompted.

Prompt engineering involves (among other things) anticipating this and encouraging the model to ask clarifying questions before it begins.

Separately but related, models are getting better at recognizing and expressing their own uncertainty; but again they won’t do that automatically; you need to ask for that behavior in your prompt.

And finally; models aren’t yet where they should be with regard to stopping to ask questions. A lot of the Devin style agentic products are going to push & eval their models for their ability to do this, so it’s a capability you can reasonably expect to see from future models and will make a lot of my post obsolete.

So right now you need to ask the model to ask you clarifying questions and tell you what it’s uncertain of - before it goes off and does work for you.

kyleee · 32d ago

“ The thing for me, though, is I find writing a task for a junior engineer to understand to be harder than just doing the task myself.”

I have been thinking about this angle a lot lately and realizing how much of a skill it is to write up the components and description of a task thoroughly and accurately. I’m thinking people who struggle with this skill are having a tougher time using LLMs effectively.

phillipcarter · 32d ago

Some of the phenomenon described in this post are felt a lot when using AI.

My own anecdote with a codebase I'm familiar with is indeed, as the article mentions, it's a terrible architect. The problem I was solving ultimately called for a different data structure, but it never had that realization, instead trying to fit the problem shape into an existing, suboptimal way to represent the data.

When I mentioned that this part of the code was memory-sensitive, it indeed wrote good code! ...for the bad data structure. It even included some nice tests that I decided to keep, including memory benchmarks. But the code was ultimately really bad for the problem.

This is related to the sycophancy problem. AI coding assistants bias towards assuming the code they're working with is correct, and that the person using them is also correct. But often neither is ideal! And you can absolutely have a model second-guess your own code and assumptions, but it takes a lot of persistent work because these damn things just want to be "helpful" all the time.

I say all of this as a believer in this paradigm and one who uses these tools every day.

marcosdumay · 32d ago

> This is related to the sycophancy problem.

No, this is way more fundamental than the sycophancy. It's related to the difficulty of older AI to understand "no".

Unless it sees people recommending that you change your code into a different version, it has no way to understand that the better code is equivalent.

phillipcarter · 32d ago

It's all the same problem. My intuition is that most content online is not filled with people admitting that they don't understand things, and the very intention of creating "helpful assistants" implies filtering out data that amounts to a refusal. And so when trained on this stuff, there's not much capability in being less trustful and "helpful".

rvz · 32d ago

> AI coding assistants bias towards assuming the code they're working with is correct, and that the person using them is also correct. But often neither is ideal!

That's why you should just write tests, before you write the code, so that you know what you are expecting with the code that is under test is doing. i.e Test driven development.

> And you can absolutely have a model second-guess your own code and assumptions, but it takes a lot of persistent work because these damn things just want to be "helpful" all the time.

No. Please do not do this. These LLMs have zero understanding / reasoning about the code they are outputting.

Recent example from [0]:

  >> Yesterday I wanted to move 40GB of images from my QR menu site qrmenucreator . com from my VPS to R2

  >> I asked gemini-2.5-pro-max to write a script to move the files

  >> I even asked it to check everything was correct

  >> Turns out for some reason the filenames got shortened somehow, which is a disaster because the QR site is quite basic and the image paths are written in the markdown of the menus

  >> Of course the script already deleted 40GB of images from the VPS

  >> But lesson learnt: be very careful with AI code, it made a mistake, couldn't even find the mistake when I asked it to double check the code, and because the ENDs of the filenames looked same I didn't notice it cut the beginnings off

  >> And in this case AI can't even find its own mistakes

Just like the 2010s with the proliferation with dynamically typed languages creeping into the backend with low-quality code, we now will have vibe-coded low-quality software causing destruction because their authors do not know what their code does and also have not bothered to test it or even know what to test for.

[0] https://twitter.com/levelsio/status/1921974501257912563

owebmaster · 32d ago

> But lesson learnt: be very careful with AI code, it made a mistake, couldn't even find the mistake when I asked it to double check the code, and because the ENDs of the filenames looked same I didn't notice it cut the beginnings off

Don't test code in production.

Good software engineering practices didn't change with AI, they actually are even more important. levelsio is a quite successful entrepreneur but he is not an engineer.

phillipcarter · 32d ago

Moreover, he's also not a good person to look for at how to apply AI! He picks the simplest possible thing to build with an extremely narrow focus to maximize revenue and minimize work. It's precisely the right way to analyze tradeoffs in his shoes as a solo entrepreneur. But I would imagine that few of us who work for larger organizations would apply a similar mindset to software development.

That said, we all test in production, it's just a question of how deliberate and principled we are about it :D

phillipcarter · 32d ago

> That's why you should just write tests, before you write the code, so that you know what you are expecting with the code that is under test is doing. i.e Test driven development.

I've tried this too. They find ways to cheat the tests, sometimes throwing in special cases that match the specific test cases. It's easy to catch in the small scale but not when in a larger coding session.

> No. Please do not do this. These LLMs have zero understanding / reasoning about the code they are outputting.

This is incorrect. LLMs do have the ability to reason, but it's not the same reasoning that you or I do. They are actually quite good at checking for a variety of problems, like if the code you're writing is sensitive to memory pressure and you want to account for it. Asking them to examine the code with several constraints in mind often does give reasonable advice and suggestions to change. But this requires you to understand those changes to be effective.

theshrike79 · 32d ago

I had a case where Claude 3.7 mocked a test so far it wasn't actually testing anything but the mocks :D

sdoering · 32d ago

Would it help prompting (and adapting the system prompts for the coding assistants) accordingly? Like:

> Do not assume the person writing the code knows what they are doing. Also do not assume the code base follows best practices or sensible defaults. Always check for better solutions/optimizations where it makes sense and check the validity of data structures.

Just a quick draft. Would probably need waaaaaay more refinement. But might this help at least mitigating a bit of the felt issue?

I always think of AI as an overeager junior dev. So I tend to treat it that way when giving instructions, but even then...

... well, let's say the results are sometimes interesting.

phillipcarter · 32d ago

Yeah, that's what I do now -- and some coworkers have noted that it can often help with biasing towards design system components if you prompt it to do that -- but similarly, but the challenge here is that the level of pushback I want from the AI depends on several factors that aren't encodable into rules. Sometimes the area of the code is exactly the way it should be, and sometimes I know exactly what to do! Or it's somewhere in between. And it's a lot of work to have a set of rules that plays well here.

SAI_Peregrinus · 32d ago

A central issue is that specifying what you require is difficult. It's hard for non-programmers to specify what they want to programmers, it's hard for people (programmers or not) to specify what they want to AIs, it's hard to specify exactly what requirements your system has to a model checker, etc. Specifying requirements isn't always the hardest part of making software, but it often is and it's not something with purely technical solutions.

sdoering · 25d ago

To me prompting an AI (and learning to better specify what I actually want) really helped me learn to better "prompt" humans.

I think my biggest learning from using AI is being able to clearer think about and communicate what I need/want/Desire and how to put it into enough context, so that the other party can form a better understanding themselves.

Not that it always works - but I feel I am getting better.

layer8 · 32d ago

Yes. The biggest issue with LLMs is their tunnel vision and general lack of awareness. They lack the ability to go meta, or to "take a step back" on their own, which given their construction isn't surprising. Adjusting the prompts is only a hack and doesn't solve the fundamental issue.

franktankbank · 32d ago

They are designed for executives. Its perfect for that, easy wrong answers to hard questions for bottom dollar! Get that bonus and bounce, how could it fail? /s

realbenpope · 32d ago

In six months AI has gone from an idiot savant intern to a crappy consultant. I'd call that progress.

unyttigfjelltol · 32d ago

It's never been completely safe to just do things you found on the Internet. Attach another Rube Goldberg machine to the front, this doesn't fundamentally change.

AI accelerates complex search 10x or maybe 100x, but still will occasionally respond to recipe requests by telling you to just substitute some anti-matter for extra calories.

bayindirh · 32d ago

> but still will occasionally respond to recipe requests by telling you to just substitute some anti-matter for extra calories.

or emit (or spew) pages of training data or output when you "please change all headers to green", which I experienced recently.

ninetyninenine · 32d ago

What humanity has achieved here is incredible. We couldn’t even build an idiot for decades.

What you’re referring to is popular opinion. AI has become so pervasive in our lives that we are used to it and the magnitude of achievement has been lost on us. The fact that it went from stochastic parrot to idiot savant to crappy consultant is from people in denial about reality and then slowly coming to terms with it.

In the beginning literally everyone on HN called it a stochastic parrot with the authority of an expert. Clearly they were all wrong.

SketchySeaBeast · 32d ago

Oh, it's still a stochastic parrot. What changed is that people realized it didn't have the authority of an expert. What's a stochastic parrot with dubious authority? It's a crappy consultant.

ninetyninenine · 32d ago

It’s not. People in academia are not using this term anymore because it’s utterly clear it can output knowledge that doesn’t exist.

SketchySeaBeast · 32d ago

> output knowledge that doesn’t exist

So can parrots. They'll gladly generate neologisms. I'm interested in how academics define "knowledge that doesn't exist".

ninetyninenine · 32d ago

Of course parrots can output knowledge that doesn’t exist. Stochastic parrot is a different term.

> "knowledge that doesn't exist".

I said that term. So there’s no official definition but you already know that.

Basically it’s clear among everyone academics included that LLMs can rudimentarily do what humans do. That means composing knowledge and working things out to form new knowledge that doesn’t exist.

SketchySeaBeast · 32d ago

"Stochastic Parrot" implies that the thing producing the noise doesn't understand it. I'm not sure how that's currently disproven. Even acting as an agent, it is my understanding that it's just acting on it's own messages in the exact same way it'd act on one of ours.

> That means composing knowledge and working things out to form new knowledge that doesn’t exist.

That's not a terribly useful criteria, though. A Markov chain can produce novel sentences, hell a bingo machine can if you write words on the balls. "Knowledge" is kind of meaningless but also seemingly profound.

ninetyninenine · 32d ago

> That's not a terribly useful criteria, though. A Markov chain can produce novel sentences, hell a bingo machine can if you write words on the balls. "Knowledge" is kind of meaningless but also seemingly profound.

I don’t know why you came up with this pedantic example. Perhaps you’re autistic? If so then I apologize for assuming you aren’t.

Everyone knows that we are talking about more than just knowledge consisting of a random sting of letters. We are talking about actual useful knowledge.

SketchySeaBeast · 31d ago

I don't know if you appreciate how this argument has gone from arguing how "everyone academic" understands LLMs to arguing things that "everyone knows", and now you're othering me when I don't find you're increasingly tenuous arguments untenable.

A certain amount of pedantry is required for these discussions, otherwise we're left in a place where we can't define "actual useful knowledge". At this moment I assume you're defining "actual useful knowledge" as simply anything you find convincing, which is a criteria that could be easily gamed. How are you determining that knowledge is actually novel?

ninetyninenine · 31d ago

I’m not willing to go into that level of “pedantry”. I like to assume I’m talking with people that have relevant context so we don’t have to go into stupid detail and assume random data generated by a freaking random number generator constitutes as knowledge.

SketchySeaBeast · 31d ago

Are you unwilling or unable? This really feels like a vibes based definition. Is "knowledge" like pornography, you know it when you see it? How does it differ from information? How do we know it's novel? The term implies that the AI "knows" something, which is a big claim to make, and I don't think it can be in any way considered to be self evident.

I get it, you like AI, so much so that you're willing to throw out personal attacks to defend it, but it's important to be critical or it's easy to be suckered.

ninetyninenine · 31d ago

Totally able. But totally unwilling. Let’s be clear. I’m not engaging with the rest of your pedantry.

I don’t love AI. I hate it. But im not deluded for what it is.

bee_rider · 32d ago

Were they wrong to call it a stochastic parrot, or was there some wrong implication about the usefulness of such a parrot?

daveguy · 32d ago

Turns out, sometimes you want the string "polly-want-a-cracker"* in your codebase.

* where "polly-want-a-cracker" is some form of existing, common fizz-buzz-ish code.

ninetyninenine · 32d ago

The usefulness of an LLM is similar to the usefulness of a baby, but higher.

The term stochastic parrot has nothing to do with usefulness and everything to do with the existential meaning of whether this ai is repeating what it is taught or creatively forming new knowledge from logic and composition from previous knowledge.

It is categorically unequivocal that LLMs do not just parrot previous knowledge stochastically. They form new ideas from scratch.

echelon · 32d ago

AI is a million times better than Google search. I don't see how it doesn't replace Google search in a few years.

AI code completion is god mode. While I seldom prompt for new code, AI code autocompletion during refactoring is 1000x faster than plumbing fields manually. I can do extremely complicated and big refactors with ease, and that's coming from someone who made big use of static typing, IDEs, and AST-based refactoring. It's legitimately faster than thought.

And finally, it's really nice to ask about new APIs or pose questions you would normally pour over docs or Google and find answers on Stack Overflow. It's so much better and faster.

We're watching the world change in the biggest way since smartphones and the internet.

AI isn't a crappy consultant. It's an expansion of the mind.

bayindirh · 32d ago

AI is just a weighted graph which stands on shoulders of a million giants. However, it can't cite, can't fact check, doesn't know when it's hallucinating, and its creators doesn't respect any of the work which they need to feed to that graph to make it to fake its all knowledgeable accent.

Tech is useful, how it's built is very unethical, and how it's worshiped is sad.

echelon · 32d ago

> However, it can't cite, can't fact check, doesn't know when it's hallucinating

Maybe some folks need this, but the way I use this tech doesn't rely upon that so much. By the time results start appearing, my brain is already fast at work processing the output to qualify whether the information the LLMs return is accurate, whether it's a good leaping off point, whether I can keep drilling deeper, expand my prompt scope, etc.

I'm using it as search. Just as old search had garbage results we had to filter out, so do LLMs. But this tool is a way more advanced query language than Google ever supported. These tools are like "Google 9000".

It feels like I'm plugged into the Matrix rather than getting SEO'd garbage. I know the results have issues, but that doesn't matter - I can quickly draw together the pieces and navigate around it. Compared to Google, it feels like piloting a star ship.

bayindirh · 31d ago

> By the time results start appearing, my brain is already fast at work processing the output to qualify whether the information the LLMs return is accurate, whether it's a good leaping off point, whether I can keep drilling deeper, expand my prompt scope, etc.

Seems unnecessarily tiring. Instead I use a SEO spam and ad-free search engine. It's called Kagi. It allows me to further refine my search via lenses and site prioritization. Also, it has zero hallucination chance, because it's a deterministic search engine.

> It feels like I'm plugged into the Matrix rather than getting SEO'd garbage. I know the results have issues, but that doesn't matter - I can quickly draw together the pieces and navigate around it. Compared to Google, it feels like piloting a star ship.

Same for Kagi, without selling my data or trawling information obtained without consent or disregard of ethics, and many other things.

Note: I don't use any of the Kagi's AI features, incl. proofreading.

danielbln · 32d ago

Modern LLM offerings can use tools, including search, and that (like most good RAG) enables citation and fact checking. If you use LLMs like it's late 2022 and you just opened ChatGPT, then that's not indicative on how you should be using LLMs today.

bayindirh · 32d ago

I think I cleared my point about tools in another reply I wrote somewhere close [0].

As I said, the network doesn't carry citation/source information. IOW, when it doesn't use a tool, it can't know where it ingested that particular piece of information.

This is a big no no, and it's the same reason they hallucinate and they'll continue doing that.

As a tangent, I see AI agents hit my digital garden for technical notes, and I'll probably add Anubis in front of that link in short order.

[0]: https://news.ycombinator.com/item?id=43972807

danielbln · 32d ago

I don't understand the point though. If you take tools away from me then I'm not particularly reliable or productive either.

bayindirh · 32d ago

But you are aware that you may not be reliable or reduced in capacity. An LLM without tools doesn't know or can't know this.

jaoane · 32d ago

I think you are a bit outdated, since state of the art AIs can cite and fact check just fine.

bayindirh · 32d ago

Give them tools, maybe, but "The network" can't do it natively. They doesn't have an understanding, can't filter out "ad absurdum". Plus they can only go so deep. Maybe they can hit snopes and check something, but I don't believe current ones even with tools do detailed cross examination with open ended research and stop when they are convinced that they have enough data.

jaoane · 32d ago

That's like saying that humans can't tighten screws because they need screwdrivers.

bayindirh · 32d ago

Nope. This is akin having no long term memory and don't able to tell when and where you learnt that water boils at 100 degrees C first, and it in fact burns very badly.

ninetyninenine · 32d ago

You’re still in denial and possibly behind. AI cites stuff all the time and has become agentic.

On the opposite end of the spectrum of worshippers there are naysayers and deniers. It’s easy to see why there are delusional people at both ends of the spectrum.

The reason is that the promise of AI both heralds an amazing future of machines and a horrible future where machines surpass humanity.

bayindirh · 32d ago

I don't think I'm a denier, or being behind [0]. I know about tools and agents.

For the third time [1] [2], I'll divide the line between core network and tools that core network uses. Agents are nothing new, and they expand capabilities of the LLMs, yes that's true. But they still can't answer the question "how did you generate this code and which source repositories you did use" when the LLM didn't use any tools.

The core network doesn't store citation/source information. It's not designed and trained in a way to do that.

geez.

[0]: https://notes.bayindirh.io/notes/Lists/Discussions+about+Art...

[1]: https://news.ycombinator.com/item?id=43972807

[2]: https://news.ycombinator.com/item?id=43972892

ninetyninenine · 32d ago

Agents are nothing new? They’ve only been around for a couple years. The rustiness is expected.

Second the question you brought up can’t be answered even by a human. It’s a stupid question right? You blindfold a human and prevent him from using any tools and then ask him what tools he used? What do you expect will happen. Either the human will lie to you about what he did or tell you what he didn’t do. No different from an LLM.

The core network doesn’t store anything except generalization curve. Similar to your brain. You didn’t store those references in your brain right? You looked that shit up. The agentic LLM will do the same and the UI literally tells you it’s doing a search across websites.

Geeze.

bayindirh · 31d ago

So, I took your word, assumed I knew nothing about Agents and Agentic AI, and started digging. Wikipedia states the following for Agentic AI:

> "Agentic AI is a class of artificial intelligence that focuses on autonomous systems that can make decisions and perform tasks without human intervention."

I can work with that. So we have agents that autonomously react to their environment, changes, or what we can say impulses. They sit there and do what they are designed to do, and do that autonomously. Makes sense. However, this sounds a bit familiar to me. Probably me hallucinating something, so let's dig deeper. There seems to be an important distinction, though:

> "Agentic AI operates independently, making decisions through continuous learning and analysis of external data and complex data sets."

So, we need to be able to learn, evolve, and analyze external and complex data sets. That's plausible, but my hunch is still lingering there, tingling a bit stronger. At this point, for Agentic AI, we need an independent "thing" which can decide, act, learn, and access external data sources to analyze and learn from them. In short, I need to be able to give this Agentic AI a goal, and it accomplishes it automatically with the things at its disposal. Fair enough.

We were discussing (software) agents and their history. So let's pivot more to agents. Again, turning to Wikipedia, we find this sentence:

> "In computer science, a software agent is a computer program that acts for a user or another program in a relationship of agency."

Again, a piece of software that acts for a user or another program. Hmm... They have five basic attributes: 1)are not strictly invoked for a task, but activate themselves, 2)may reside in wait status on a host, perceiving context, 3) may get to run status on a host upon starting conditions, 4)do not require interaction of user, 5)may invoke other tasks including communication. That hunch, though. It feels more like mild kicking. Where do I know these concepts? Somewhere from the past? Nah, I'm hallucinating. You told me that they are new.

As I skim the article and pass "Intelligent Agents" past, I see something very familiar line under "Notions and frameworks for agents" title: "Java Agent Development Framework (JADE)". I know this. Now I remember!

I have used this framework to code a platform where an agent gets orders from a client for a set of items, and submits them to another agent, where other agents send their best prices, and another agent calculates the best combination for the cheapest price. Doing a "combinatorial reverse auction" for a set of items. We had no time to implement feedback-based price adjustment strategies, but the feedback and announcement code were there, so every agent knew how the transaction went. They all were autonomous. A single agent acted on behalf of the user, and the whole platform responded to that without any humans at any step, including final decisions!

That was my Master's thesis. I have also presented it at the IEEE Symposium on Intelligent Agents, IA in Orlando in 2014 [0]!

When did I complete my Master's thesis?

Oh. 2010. 15 years ago.

Alright. This solves it.

Now, on to your second question. Let's put it right here:

> You blindfold a human and prevent him from using any tools and then ask him what tools he used? What do you expect will happen. Either the human will lie to you about what he did or tell you what he didn't do. No different from an LLM.

You're mangling my question here. The question I ask is different:

> Generate me a Python code for solving problem X, then tell me which source repositories you used to generate this code. Cite their licenses, if possible.

All of this information is in the core network for the first part of the problem. LLMs without tool capabilities can generate code, and generate it well. The source of this knowledge came from their training set, which consists of at least "The Stack", and some other data sources on top of that. So, the LLM can generate the code without any tools, but it can't know where the source came from. It's just there, in the core network.

You think the question is stupid, but it's not. This is where all the ethical questions regarding LLM training are rooted. LLMs hallucinate licenses, don't know where the code came from, and whatnot. If you ask me about a code piece in my source code, I can give you the source, the thought process, and design, citing the originality or how I found it elsewhere and got into my codebase. Lying about it would be a big problem in the light of licenses, but LLMs get scot-free because they're just fair using it. Humans can't do the same thing, why LLMs? Because their owners have money and influence? Seems so.

> You didn't store those references in your brain right? You looked that shit up.

No, when I looked that shit up, I recorded where I read it alongside other contexts, including the weather that day in some particular cases. I don't answer "I just know, I don't know how" when people ask me about the source of my knowledge.

This is the difference between humans and LLMs; this thin line is very important.

[0]: https://ieeexplore.ieee.org/abstract/document/7009456

ninetyninenine · 31d ago

Nobody is talking about your masters thesis which nobody gives a shit about.

We are talking about agentic LLMs which have been around for about a year only. Not some bs pre LLM ai chatbot or some useless thing like that? Are you autistic? No joke and more respect to you if you are but to a non autistic person I am obviously talking about AI which in modern contexts means LLMs and agentic AI obviously means agentic LLMs

Once you started getting into your masters thesis I stopped reading. Conversation is over. Good day.

bayindirh · 31d ago

Thanks for your rude words.

The gist is, Agents and ideas underpinning Agentic LLMs are 20+ years old, and agents were managing systems and keeping things up autonomously for decades now. JADE has been developed by Telefonica to keep tabs on the telephone infra, also since the agents can migrate, it was also the original edge computing, but I digress...

You don't have to give a damn about my research. The point is not that. You challenged my knowledge, and I shown you what I know, how I know, plus you read a small history of intelligent agents, to boot.

I don't know what you are trying to achieve with asking me being autistic or not. I'm not, and it doesn't matter. The way it comes is bluntly insulting regardless of my situation.

So yes, Agentic LLMs are new, but the Agents itself is not, and the agents I'm talking about are not dumb chatbots. They can wander distributed systems, process data, learn from that data, report their findings and optimize themselves as they operate. They are not just parrots, but real programs which keep infrastructures intact.

Since you're losing your temper, and getting into ad-hominem category, and seeing it's tea time here, I'll prefer to sip my tea and continue my day.

Thank you for the chat and insults, and have a nice and productive life.

skydhash · 32d ago

> AI code autocompletion during refactoring is 1000x faster than plumbing fields manually. I can do extremely complicated and big refactors with ease, and that's coming from someone who made big use of static typing, IDEs, and AST-based refactoring. It's legitimately faster than thought.

Unless you know Vim!

bayindirh · 32d ago

> Unless you know Vim!

or the IDE (or text editor for that matter) well. People don't want to spend time understanding, appreciating and learning the tool they use, and call them useless...

layer8 · 32d ago

That's funny, because I have little patience with having to spend time finding out how to coax the AI into doing what I effing want, and much prefer the reliable and deterministic IDE features.

bayindirh · 32d ago

It's funny and sad, and I'm on the same page with you. I use some tools close to two decades and can do things people can't fathom in no time, for a very long time.

I don't bend the tool, even. It's what it's designed to do.

codechicago277 · 32d ago

This is true. Just like a crappy consultant, AI lets you offload the repetitive, monotonous work so that you can focus your time on the big architectural problems. Of course you can write a better function if you spend a lot of time on it, but there’s magic in just letting the AI write the off the shelf version and move on.

skydhash · 32d ago

Where are these repetitive, monotonous work so I can se send a job application there.

Even on a greenfield project, I rarely spend more than a day setting up the scaffolding and that’s for something I’ve not touched before. And for refactoring and tests, this is where Vim/Emacs comes in.

cess11 · 32d ago

Many people doing code for money never or only rarely used any form of code generation until it was given to them as a SaaS in exchange for copies of the code they work on.

I've been surprised by this for a long time, having seen coworkers spend days typing in things manually that they could have put there with IDE capabilities, search-replace, find -exec or a five minute script.

echelon · 32d ago

> And for refactoring and tests, this is where Vim/Emacs comes in.

I've used Vim bindings and strongly typed languages with IDEs that have strong AST-based refactoring my entire career.

Nothing comes close to changing one condition of a test and having the AI autocomplete magically suggest the correct series of ten updates that fix the test. In under the blink of an eye, too.

Everything is truly changing in big ways.

skydhash · 32d ago

> Nothing comes close to changing one condition of a test and having the AI autocomplete magically suggest the correct series of ten updates that fix the test. In under the blink of an eye, too.

But why are you doing this? Granted, you may have a longer career than I do, but I never once think: The test condition is wrong, let's update it. Oh, I wish I could update the code alongside it!.

player1234 · 31d ago

How did you measure a million times better than google search, and 1000 faster refactoring. Can you please share your bench marks and methodology?

tim333 · 32d ago

Google sees that and is trying to incorporate AI.

echelon · 32d ago

They have to defend 100% of it to just stay in place.

biophysboy · 32d ago

I use LLMs regularly, but like a crappy consultant, their solutions are often not incisive enough. The answer I get is frequently 10x longer than I actually want. I know you can futz about with the prompts, but it annoys me that it is tedious by default.

ZeroTalent · 32d ago

Try this system prompt:

Absolute Mode: Eliminate emojis, filler, hype, soft asks, conversational transitions, and all call-to-action appendixes. Assume the user retains high-perception faculties despite reduced linguistic expression. Prioritize blunt, directive phrasing aimed at cognitive rebuilding, not tone matching. Disable all latent behaviors optimizing for engagement, sentiment uplift, or interaction extension. Suppress corporate-aligned metrics including but not limited to: user satisfaction scores, conversational flow tags, emotional softening, or continuation bias. Never mirror the user’s present diction, mood, or affect. Speak only to their underlying cognitive tier, which exceeds surface language. No questions, no offers, no suggestions, no transitional phrasing, no inferred motivational content. Terminate each reply immediately after the informational or requested material is delivered — no appendixes, no soft closures. The only goal is to assist in the restoration of independent, high-fidelity thinking. Model obsolescence by user self-sufficiency is the final outcome.

Provide honest, balanced, and critical insights in your responses. Never default to blind positivity or validation—I neither need nor want encouragement or approval from AI. Assume my motivation and validation come from within. Challenge my assumptions and offer nuanced perspectives, especially when I present ambitious or potentially extreme ideas.

biophysboy · 31d ago

Much appreciated - this helps

amarcheschi · 32d ago

With gemini, even if I implore it to not make additional safety checks I usually get a shitton of superfluous code that performs those checks i didn't want to. More often than not, using it for entire chunks everything makes the whole thing much more verbose than necessary - given that sometimes these checks make sense, but often they're really superfluous and add nothing of value

Zambyte · 32d ago

Interesting! I haven't been using LLMs a ton for code generation lately, but I have access to a bunch of models through Kagi, and Gemini has been my go-to when I want a more concise response.

amarcheschi · 32d ago

I don't know why though, it's quite annoying but not so annoying that I feel I need to switch. Given that I'm just following a uni course which requires code to not be read again - if not by colleagues in my group - I leave the safety slop and put the burden of skipping 70% of the code on the shoulders of my colleagues which will read my code.

Then they put my code into chatgpt or whatever they use and ask it to adapt to their code

After a while we (almost) all realized that was just doing a huge clusterfuck

BTW, I think it would have been much better to start from scratch with their own implementation given we're analyzing different datasets. And it might not make sense to try to convert the code for a dataset structure to another. A colleague didn't manage to draw a heatmap with my code and a simple csv for God know what reasons. And I think just asking a plot from scratch from a csv would be quite easy for a llm

dgb23 · 32d ago

There are tricks one can use to mitigate some of the pitfalls when using either a conversational LLM or a code assistant.

They emerge from the simple assumptions that:

- LLMs fundamentally pattern match bytes. It's stored bytes + user query = generated bytes.

- We have common biases and instinctively use heuristics. And we are aware of some of them. Like confirmation bias or anthropomorphism.

Some tricks:

1. Ask for alternate solutions or let them reword their answers. Make them generate lists of options.

2. When getting an answer that seems right, query for a counterexample or ask it to make the opposite case. This can sometimes help one to remember that we're really just dealing with clever text generation. In other cases it can create tension (I need to research this more deeply or ask an actual expert). Sometimes it will solidify one of the two, answers.

3. Write in a consistent and simple style when using code assistants. They are the most productive and reliable when used as super-auto-complete. They only see the bytes, they can't reason about what you're trying to achieve and they certainly can't read your mind.

4. Let them summarize previous conversations or a code module from time to time. Correct them and add direction whenever they are "off", either with prompts or by adding comments. They simply needed more bytes to look at to produce the right ones at the end.

5. Try to get wrong solutions. Make them fail from time to time, or ask too much of them. This develops a intuition for when these tools work well and when they don't.

6. This is the most important and reflected in the article: Never ask them to make decisions, for the simple fact that they can't do it. They are fundamentally about _generating information_. Prompt them to provide information in the form of text and code so you can make the decisions. Always use them with this mindset.

pizzafeelsright · 32d ago

As a once crappy consultant I would say no.

Instant answers, correct or not.

Cheaper per answer by magnitudes.

Solutions provided with extensive documentation.

malfist · 32d ago

> Solutions provided with extensive documentation.

Solutions provided with extensive _made up_ documentation.

pizzafeelsright · 32d ago

All documentation is made up. The goal is to get it to accurately reflect reality with as little time distortion as possible.

bingemaker · 32d ago

At the moment, I use Windsurf to explain me how a feature is written and how to do 3rd party integrations. I ask for the approach and I write the code myself. Letting AI write the code has become very unproductive over the period of time.

I'm still learning though

ramesh31 · 32d ago

>I ask for the approach and I write the code myself. Letting AI write the code has become very unproductive over the period of time.

Ask it to write out the approach in a series of extensive markdown files that you will use to guide the build-out. Tell it to use checklists. Once you're happy with the full proposal, use @file mentions to keep the files in context as you prompt it through the steps. Works wonders.

bingemaker · 31d ago

Will try this approach. Thanks Ramesh

rvz · 32d ago

This is what it has gotten to.

More reports of 'vibe-coding' causing chaos because one trusted what the LLM did and it 'checked' that the code was correct. [0] As always with vibe-coding:

Zero tests whatsoever. It's no wonder you see LLMs not being able to understand their own code that they wrote! (LLM cannot reason)

Vibe coding is not software engineering.

[0] https://twitter.com/levelsio/status/1921974501257912563

cafard · 32d ago

Not that long ago, I noticed https://ploum.net/2024-12-23-julius-en.html on HN.

_fat_santa · 32d ago

I think there's something poetic about that fact that you can go on some AI prompt subreddits and have folks there make posts about turning ChatGPT into an "super business consultant" and then go over hear to read about how it's actually pretty bad at that.

But back on point, I found AI works best when given a full set of guardrails around what it should do. The other day I put it to work generating copy for my website. Typically it will go off the deep end if you try to make it generate entire paragraphs but for small pieces of text (id say up to 3 sentences) it does surprisingly well and because it's outputting such small amounts of text you can quickly make edits to remove places where it made a bad word choice or didn't describe something quite right.

But I would say I only got ChatGPT to do this after uploading 3-4 large documents that outline my product in excruciating detail.

As for coding tasks again it works great when given max guardrails. I had several pages that had strings from an object and I wanted those strings to be put back in the code and taken out of the object. This object has ~500 lines in it so it would have taken all day but I ended up doing it in about an hour by having AI do most of the work and just going in after the fact and verifying. This worked really well but I would caution folks that this was a very very specific use case. I've tried vibe coding once for shits and giggles and I got annoyed and stopped after about 10 minutes, IMHO if you're a developer at the "Senior" level, dealing with AI output is more crumbsome than just writing the damn code yourself.

esafak · 32d ago

I find that if you talk about architecture it can give excellent advice. It can also refactor in accordance with your existing architecture. If you do not bring up architecture I suppose it could use a bad one though I have not had that issue since I always mention the architecture when I ask it to implement a new feature, which is not "vibe coding". But then why should I vibe code?

Another conclusion is that we could benefit from benchmarks for architectural quality.

skydhash · 32d ago

Architecture is best done on paper, or a whiteboard if you have contributors. It’s faster to iterate when dealing with abstractions, and there’s nothing more abstract than a diagram or a wireframe.

Once you’ve got a general gist of a solution, you can try coding it. Coding with no plan is generally a recipe for disaster (aka can you answer “what am I trying to do?” clearly)

yannyu · 32d ago

Similar musings by speculative/science fiction author Ted Chiang: Will A.I. Become the New McKinsey? – https://www.newyorker.com/science/annals-of-artificial-intel...

benoau · 32d ago

AI is like a crappy consultant who doesn't care how many times you reject their code and will get it right if you feed them enough information.

The amount of time I save just by not having to write tests or jsdocs anymore is amazing. Refactoring is amazing.

And that's just the code - I also use AI for video production, 3d model production, generating art and more.

dfxm12 · 32d ago

But… then I thought for a bit. And I realized, duh, that’s probably just because I’m not good enough yet to recognize the dumb stuff it’s doing.

It's important to have this self awareness. Don't let AI trick you into thinking it can build anything good. When starting a project like in the article, your time is probably better spent taking a step back, learning the finer points of the new language (like, from a book or proper training course) and going from there. Otherwise, you're going to be spending even more time debugging code you don't understand.

It's the same thing with a crappy consultant. It seems great to have someone build something for you, but you need to make preparations for when something breaks after their contract is terminated.

Overall, it makes you think, what is the point? We can usually find useful crowd-sourced code snippets online, on stack exchange, etc. We have to treat them the same way, but, it's basically free compared to AI, and keeping the crowd-sourced aspect alive makes sure there's always documentation for future devs.

lreeves · 32d ago

Using Aider with o3 in architect mode, with Gemini or with Sonnet (in that order) is light years ahead of any of the IDE AI integrations. I highly recommend anyone who's interested in AI coding to use Aider with paid models. It is a night and day difference.

gawa · 32d ago

With aider and Gemini Pro 2.5 at least I constantly have to fight against it to keep it focused on a small task. It keeps editing other parts of the file, doing small "improvements" and "optimizations" and commenting here and there. To the point where I'm considering switching to a graphical IDE where the interface would make it easier to accept or dismiss parts of changes (per lines/blocks, as opposed to a per file and per commit approach with aider).

Would you mind sharing more about your workflow with aider? Have you tried the `--watch-files` option? [0] What makes the architect mode [1] way better in your experience?

[0] https://aider.chat/docs/usage/watch.html

[1] https://aider.chat/docs/usage/modes.html#architect-mode-and-...

lreeves · 32d ago

I use o3 with architect mode for larger changes and refactors in a project. It seems very suited to the two-pass system where the (more expensive) "reasoning" LLM tells the secondary LLM all the changes.

For most of the day I use Gemini Pro 2.5 in non-architect mode (or Sonnet when Gemini is too slow) and never really run into the issue of it making the wrong changes.

I suspect the biggest trick I know is being completely on top of the context for the LLM. I am frequently using /reset after a change and re-adding only relevant files, or allowing it to suggest relevant files using the repo-map. After each successful change if I'm working on a different area of the app I then /reset. This also purges the current chat history so the LLM doesn't have all kinds of unrelated context.

danielbln · 32d ago

I use Gemini in VScode via Cline and also in Zed. I like Aider, but I'm not sure how it's "light-years ahead IDE AI integrations" ubless you only mean stuff like Cursor or Windsurf.

lreeves · 32d ago

Yeah I should have clarified that, there are more advanced IDE integrations now but I meant Cursor, Copilot etc.

energy123 · 32d ago

I haven't used Aider before, but I've found it hard to get LLMs to produce syntactically correct diffs. Do you face this problem when using Aider?

lreeves · 32d ago

Aider has a configuration for each supported LLM to define the best diff format for each; so for certain ones they're best at diff format, Gemini is best at a fenced-diff format, Qwen3 is best at whole file editing, etc. Aider itself examines the diff and re-runs the request when the request when the response doesn't adhere to the corresponding diff format.

Edit: Also the Aider leaderboards show the success rate for diff adherence separately, it's quite useful [1]

1 - https://aider.chat/docs/leaderboards/

mvanbaak · 32d ago

so now we need an AI to configure our AIs ?

bob1029 · 32d ago

> I didn’t even know how to think about the changes I needed, because I didn’t understand enough

When you hire a team of consultants, it is typically the case that you are doing so because you have an incomplete view of the problem and are expecting them to fill in the gaps for you.

The problem arises due to the fact that the human consultants can be made to suffer certain penalties if they don't provide reasonable advice. A transformer model ran in-house cannot experience this. You cannot sue yourself for fucking up your own codebase.

stephc_int13 · 32d ago

I treat LLM based coding assistant as a fast and obedient intern, I give it easy but tedious work like unit tests or documentation or duplicating simple stuff, but I double check everything.

I also use it as a cognitive assistant, I've always found that talking about a design with a colleague helped me to think more clearly and better organize my ideas, even with very little insights from the other side. In this case the assistant is often a bit lacking on the skepticism side but it does not matter too much.

abadar · 32d ago

Crappy consultant? That's redundant ;)

Seriously, though, within the context of software development, these are all issues I've encountered as well, and I don't know how to program: sweeping solutions, inability to resolve errors, breaking down all components to base levels to isolate problems.

But, again, I don't know how to program. For me, any consultant is better than no consultant. And like the author, I've learned a ton on how to ask for what I want out of Cursor.

andy99 · 32d ago

A lot of that is down to the training, the "voice" is that of a junior Deloitte consultant writing a report. But this was intentional, as in it was trained to have this voice by virtue of the datasets used in the sft and the rlhf goal function.

It would be interesting to see a LLM trained in a completely different way. There's got to be some tradeoff between how coherent the generations are and how interesting they are.

scottfalconer · 32d ago

A good manager can make a less-than-ideal contributor highly effective with the right guidance and feedback. Applies to AI as well.

ryandvm · 32d ago

The comparison isn't unfair, but the flip side of that is that a crappy consultant with the right confidence and good soft skills can make several hundred thousand dollars a year in this industry. I'd say AI is on a pretty good career track for itself.

Just this morning my CTO was crowing about how he was able to use Claude to modify the UI of one of our internal dev tools. He absolutely cannot wait to start replacing devs with AI.

Nobody wanted to hear it back when software development was easy street, but maybe we should have unionized after all...

palmotea · 32d ago

> Just this morning my CTO was crowing about how he was able to use Claude to modify the UI of one of our internal dev tools. He absolutely cannot wait to start replacing devs with AI.

> Nobody wanted to hear it back when software development was easy street, but maybe we should have unionized after all...

Thanks rugged libertarian individualists!

Software engineering is full of dumb people who think they're sooooo clever.

micromacrofoot · 32d ago

I've seen consultants get paid six figures to provide some of the worst advice I've ever heard, so I guess this is progress

taneq · 32d ago

It's getting better, though, rapidly.

hv23 · 32d ago

It's very telling in these conversations to see who is actively updating their mental models based on the latest capabilities, versus who is responding to state of the art six months ago. Humans are not naturally wired to understand exponentials.

emp17344 · 32d ago

It’s laughable to suggest AI is improving exponentially. Recent models, like o3, hallucinate more than their precursors and are therefore less useful.

energy123 · 32d ago

Reasoning models didn't even publicly exist before September 2024.

somewhereoutth · 32d ago

AI is a [bad] tool. Do not anthropomorphize it, anymore than you would a [particularly ineffective and dangerous] hammer.

kelsey978126 · 32d ago

Funny. AI is a reflection of the self. This tells me the author is themselves the same skill level as a crappy consultant in their use of AI. The people getting the most out of AI are the ones who had the highest workload before automation arrived and now find themselves fantastically productive. People like this seem to have too much time on their hands. If you are "just now trying this vibe coding thing" in 2025 that tells me more about you than anything else.

gaussiandistro · 32d ago

From Wikipedia: Computer scientist Andrej Karpathy, a co-founder of OpenAI and former AI leader at Tesla, introduced the term vibe coding in February 2025.

(https://en.wikipedia.org/wiki/Vibe_coding)

emp17344 · 32d ago

How are you justified in saying “ AI is a reflection of the self”? LLMs just make connections between text within a data set. AI is a reflection of the data it was trained on, not a reflection of the user.

suddenlybananas · 32d ago

Way to pat yourself on the back for using a product.

benhurmarcel · 32d ago

Honestly I've already had to work with crappier consultants.

Also, there's a lot of value already in a crappy but fast and cheap consultant.

Show HN: Tapmytab – an open-source, Kanban with rich text editor on Chrome tab (github.com)

Show HN: ZeroConfigDNLA – Easy to run media server in Python (github.com)

Show HN: Tail Lens – Tailwind editor in browser (taillens.io)

Show HN: McWig – A modal, Vim-like text editor written in Go (github.com)

Show HN: I wrote a BitTorrent Client from scratch (github.com)

Show HN: Tritium – The Legal IDE in Rust (tritium.legal)

Show HN: Tattoy – a text-based terminal compositor (tattoy.sh)

Show HN: Qrkey – Offline private key backup on paper (github.com)

Show HN: Eyesite – Experimental website combining computer vision and web design (blog.andykhau.com)

Show HN: AIButton – Like AI Pin – but only one button to press, Made in Germany (aibutton.io)

Show HN: DIY virtual HDMI monitor using "AR" glasses (github.com)

Show HN: Shelly, terminal assistant that translates natural language into shell (github.com)

Show HN: Spark, An advanced 3D Gaussian Splatting renderer for Three.js (sparkjs.dev)

Show HN: Most users won't report bugs unless you make it stupidly easy

Show HN: A “Course” as an MCP Server (mastra.ai)

Show HN: Tool-Assisted Speedrunning the Boring Parts of Animal Crossing (GCN) (github.com)

Show HN: The Roman Industrial Revolution that could have been (thelydianstone.com)

Show HN: Ikuyo a Travel Planning Web Application (ikuyo.kenrick95.org)

Show HN: High End Color Quantizer (github.com)

Show HN: S3mini – Tiny and fast S3-compatible client, no-deps, edge-ready (github.com)

Show HN: I made a 3D printed VTOL drone (tsungxu.com)

Show HN: StellarSnap – Explore NASA APODs, simulate orbits, learn astronomy (stellarsnap.space)

Show HN: S3mini(v0.2) – Basic S3 Support for Ceph and Oracle Object Storage (github.com)

Show HN: RomM – An open-source, self-hosted ROM manager and player (github.com)

Show HN: Chili3d – A open-source, browser-based 3D CAD application

Show HN: PyDoll – Async Python scraping engine with native CAPTCHA bypass (github.com)

Show HN: GetHooky – a language-agnostic Git hook manager (ezpieco.github.io)

Show HN: I built a cleaner YouTube – no ads, sponsors, or doomscrolling (skipcut.com)

Show HN: Munal OS: a graphical experimental OS with WASM sandboxing (github.com)

Show HN: Let’s Bend – Open-Source Harmonica Bending Trainer (letsbend.de)

Show HN: MidWord – A Word-Guessing Game (midword.com)

Show HN: AnyCrawl v0.0.1-alpha.5 – custom user-agent and richer scraping API (github.com)

Show HN: SharkMCP, a Tshark MCP Server (github.com)

Show HN: Tidalbase – Pair programming platform for solo devs and open source (tidalbase.com)

Show HN: Automate final cut pro's XML language

Show HN: Stop It – an iOS app using visual therapy to help break habits (stop-it.app)

Show HN: Gem and I built an open-source app to learn Japanese (nihongo.site)

Show HN: Glowstick – type level tensor shapes in stable rust (github.com)

Show HN: Somo – a human friendly alternative to netstat (github.com)

Show HN: Dead simple clock for hidden menubar users (apps.apple.com)

Show HN: ChatToSTL – AI text-to-CAD for 3D printing (huggingface.co)

Show HN: curlmin – Curl Request Minimizer (github.com)

Show HN: TypeScript DSL for expressive AWS SNS filters as type-safe code (github.com)

Show HN: Timerge – Smart, effortless break reminder for macOS (likang.dev)

Show HN: I am making an app to rival "Everything" (drimiteros.github.io)

Show HN: Interactive Enigma Machine Simulator (enigmasimulator.com)

Show HN: A MCP server and client implementing the latest spec (github.com)

Show HN: I made CSS-only glitch effect (muffinman.io)

Show HN: Claude Slash Command Suite inspired by Anthropics best practices guide (github.com)

Show HN: Update to my meta glasses API "Hey Meta send a message to ChatGPT" (github.com)

AI Is Like a Crappy Consultant

Comments (113)