Show HN: BaaS to build agents as data, not code (github.com)
5 points by ishita159 1d ago 0 comments
Show HN: Bringing Tech News from HN to My Community (sh4jid.me)
3 points by sh4jid 1d ago 2 comments
The current state of LLM-driven development
70 Signez 35 8/9/2025, 4:17:16 PM blog.tolki.dev ↗
I have never heard anybody successfully using LLMs say this before. Most of what I've learned from talking to people about their workflows is counterintuitive and subtle.
It's a really weird way to open up an article concluding that LLMs make one a worse programmer: "I definitely know how to use this tool optimally, and I conclude the tool sucks". Ok then. Also: the piano is a terrible, awful instrument; what a racket it makes.
He is actually recommending Copilot for price/performance reasons and his closing statement is "Don’t fall for the hype, but also, they are genuinely powerful tools sometimes."
So, it just seems like he never really gave a try at how to engineer better prompts that these more advanced models can use.
Because for all our posturing about being skeptical and data driven we all believe in magic.
Those "counterintuitive non-trivial workflows"? They work about as well as just prompting "implement X" with no rules, agents.md, careful lists etc.
Because 1) literally no one actually measures whether magical incarnations work and 2) it's impossible to make such measurements due to non-determinism
The blogging output on the other hand ...
> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.
Learning how to use LLMs in a coding workflow is trivial to start, but you find you get a bad taste early if you don't learn how to adapt both your workflow and its workflow. It is easy to get a trivially good result and then be disappointed in the followup. It is easy to try to start on something it's not good at and think it's worthless.
The pure dismissal of cursor, for example, means that the author didn't learn how to work with it. Now, it's certainly limited and some people just prefer Claude code. I'm not saying that's unfair. However, it requires a process adaptation.
Not everyone with a different opinion is dumber than you.
Just like I can recognize a clueless frontend developer when they say "React is basically just a newer jquery". Recognizing clueless engineers when they talk about AI can be pretty easy.
It's a sector that is both old and new: AI has been around forever, but even people who worked in the sector years ago are taken aback by what is suddenly possible, the workflows that are happening... hell, I've even seen cases where it's the very people who have been following GenAI forever that have a bias towards believing it's incapable of what it can do.
For context, I lead an AI R&D lab in Europe (https://ingram.tech/). I've seen some shit.
Copilot isn't an LLM, for a start. You _combine_ it wil a selection of LLMs. And it absolutely has severe limitations compared to something like Claude Code in how it can interact with the programming environment.
"Hallucinations" are far less of a problem with software that grounds the AI to the truth in your compiler, diagnostics, static analysis, a running copy of your project, runnning your tests, executing dev tools in your shell, etc.
You're being overly pedantic here and moving goalposts. Copilot (for coding) without an LLM is pretty useless.
I stand by my assertion that these tools are all basically the same fundamental tech - LLMs.
I haven't found that to be true with my most recent usage of AI. I do a lot of programming in D, which is not popular like Python or Javascript, but Copilot knows it well enough to help me with things like templates, metaprogramming, and interoperating with GCC-produced DLL's on Windows. This is true in spite of the lack of a big pile of training data for these tasks. Importantly, it gets just enough things wrong when I ask it to write code for me that I have to understand everything well enough to debug it.
The reach is big enough to not care about our feelings. I wish it wasn't this way.
I recently started with fresh project, and until I got to the desired structure I only used AI to ask questions or suggestions. I organized and written most of the code.
Once it started to get into the shape that felt semi-permanent to me, I started a lot of queries like:
```
- Look at existing service X at folder services/x
- see how I deploy the service using k8s/services/x
- see how the docker file for service X looks like at services/x/Dockerfile
- now, I started service Y that does [this and that]
- create all that is needed for service Y to be skaffolded and deployed, follow the same pattern as service X
```
And it would go, read existing stuff for X, then generate all of the deployment/monitoring/readme/docker/k8s/helm/skaffold for Y
With zero to none mistakes. Both claude and gemini are more than capable to do such task. I had both of them generate 10-15 files with no errors, with code being able to be deployed right after (of course service will just answer and not do much more than that)
Then, I will take over again for a bit, do some business logic specific to Y, then again leverage AI to fill in missing bits, review, suggest stuff etc.
It might look slow, but it actually cuts most boring and most error prone steps when developing medium to large k8s backed project.
LLMs will always suck at writing code that has not be written millions of times before. As soon as you venture slightly offroad, they falter.
That right there is your learning curve! Getting LLMs to write code that's not heavily represented in their training data takes experience and skill and isn't obvious to learn.
No comments yet
My personal experience has been that AI has trouble keeping the scope of the change small and targeted. I have only been using Gemini 2.5 pro though, as we don’t have access to other models at my work. My friend tells me he uses Claud for coding and Gemini for documentation.
If you go by MBA types on LinkedIn that aren’t really developers or haven’t been in a long time, now they can vibe out some react components or a python script so it’s a revolution.
No comments yet
I tend to strongly agree with the "unpopular opinion" about the IDEs mentioned versus CLI (specifically, aider.chat and Claude Code).
Assuming (this is key) you have mastery of the language and framework you're using, working with the CLI tool in 25 year old XP practices is an incredible accelerant.
Caveats:
- You absolutely must bring taste and critical thinking, as the LLM has neither.
- You absolutely must bring systems thinking, as it cannot keep deep weirdness "in mind". By this I mean the second and third order things that "gotcha" about how things ought to work but don't.
- Finally, you should package up everything new about your language or frameworks since a few months or year before the knowledge cutoff date, and include a condensed synthesis in your context (e.g., Swift 6 and 6.1 versus the 5.10 and 2024's WWDC announcements that are all GPT-5 knows).
For this last one I find it useful to (a) use OpenAI's "Deep Research" to first whitepaper the gaps, then another pass to turn that into a Markdown context prompt, and finally bring that over to your LLM tooling to include as needed when doing a spec or in architect mode. Similarly, (b) use repomap tools on dependencies if creating new code that leverages those dependencies, and have that in context for that work.
I'm confused why these two obvious steps aren't built into leading agentic tools, but maybe handling the LLM as a naive and outdated "Rain Man" type doesn't figure into mental models at most KoolAid-drinking "AI" startups, or maybe vibecoders don't care, so it's just not a priority.
Either way, context based development beats Leroy Jenkins.
https://speculumx.at/pages/read_post.html?post=59
It’s not perfect but it’s okay.
Like if you need to crap out a UI based on a JSON payload, make a service call, add a server endpoint, LLMs will typically do this correctly in one shot. These are common operations that are easily extrapolated from their training data. Where they tend to fail are tasks like business logic which have specific requirements that aren’t easily generalized.
I’ve also found that writing the scaffolding for the code yourself really helps focus the agent. I’ll typically add stubs for the functions I want, and create overall code structure, then have the agent fill the blanks. I’ve found this is a really effective approach for preventing the agent from going off into the weeds.
I also find that if it doesn’t get things right on the first shot, the chances are it’s not going to fix the underlying problems. It tends to just add kludges on top to address the problems you tell it about. If it didn’t get it mostly right at the start, then it’s better to just do it yourself.
All that said, I find enjoyment is an important aspect as well and shouldn’t be dismissed. If you’re less productive, but you enjoy the process more, then I see that as a net positive. If all LLMs accomplish is to make development more fun, that’s a good thing.
I also find that there's use for both terminal based tools and IDEs. The terminal REPL is great for initially sketching things out, but IDE based tooling makes it much easier to apply selective changes exactly where you want.
As a side note, got curious and asked GLM-4.5 to make a token field widget with React, and it did it in one shot.
It's also strange not to mention DeepSeek and GLM as options given that they cost orders of magnitude less per token than Claude or Gemini.
It becomes farcical when not only are you missing the big thing but you're also proud of your ignorance and this guy is both.