Ask HN: Go deep into AI/LLMs or just use them as tools?
135 pella_may 100 5/24/2025, 7:05:46 AM
I'm a software engineer with a solid full-stack background and web development. With all the noise around LLMs and AI, I’m undecided between two paths:
1. Invest time in learning the internals of AI/LLMs, maybe even switching fields and working on them
2. Continue focusing on what I’m good at, like building polished web apps and treat AI as just another tool in my toolbox
I’m mostly trying to cut through the hype. Is this another bubble that might burst or consolidate into fewer jobs long-term? Or is it a shift that’s worth betting a pivot on?
Curious how others are approaching this—especially folks who’ve made a similar decision recently.
It probably helps a little to understand some of the internals and math. Just to get a feel for what the limitations are.
But your job as a software engineer is probably to stick things together and bang on them until they work. I sometimes describe what I do as being a glorified plumber. It requires skills but surprisingly few skills related to math and algorithms. That stuff comes in library form mostly.
So, get good at using LLMs and integrating what they do into agentic systems. Figure out APIs, limitations, and learn about different use cases. Because we'll all be doing a lot of work related to that in the next few years.
Not quite the same. E.g. databases are a part of the system itself. It's actually pretty helpful for a SWE to understand them reasonably deeply, especially when they're so leaky as an abstraction (arguably, even the more nuanced characteristics of your database of choice will influence the design of your whole application). AI/LLMs are more like dev tooling. You don't really need to know how a text editor, compiler or IDE works.
> There is no perfect software yet.
"Software" you refer to is actually 'software product', not merely 'code'. So the reality is that even with exceptional programming talent, the art of making great software products is out of reach of most teams and companies. Vision, management, product development, accurate grasp of the user needs, ..., none of these are "programming" skills.
my middle manager buzzwords this 26 times a day. triggers me.
1. Learn basic NNs at a simple level, build from scratch (no frameworks) a feed forward neural network with back propagation to train against MNIST or something as simple. Understand every part of it. Just use your favorite programming language.
2. Learn (without having to implement with the code, or to understand the finer parts of the implementations) how the NN architectures work and why they work. What is an encoder-decoder? Why the first part produces an embedding? How a transformer works? What are the logits in the output of an LLM, and how sampling works? Why is attention of quadratic? What is Reinforcement Learning, Resnets, how do they work? Basically: you need a solid qualitative understanding of all that.
3. Learn the higher level layer, both from the POV of the open source models, so how to interface to llama.cpp / ollama / ..., how to set the context window, what is quantization and how it will affect performances/quality of output, and also, how to use popular provider APIs like DeepSeek, OpenAI, Anthropic, ... and what model is good for what.
4. Learn prompt engineering techniques that influence the qualtily of the output when using LLMs programmatically (as a bag of algorithms). This takes patience and practice.
5. Learn how to use AI effectively for coding. This is absolutely non-trivial, and a lot of good programmers are terrible LLMs users (and end believing LLMs are not useful for coding).
6. Don't get trapped into the idea that the news of the day (RAG, MCP, ...) is what you should spend all your energy. This is just some useful technology surrounded by a lot of hype of all the people that want to get rich with AI and understand they can't compete with the LLMs themselves. So they pump the part that can be kinda "productized". Never forget that the product is the neural network itself, for the most part.
And people keep saying you need to make a plan first, and then let the agent implement it. Well I did, and had a few markdown files that described the task well. But Copilot‘s Agent didn’t manage to write this Swift code in a way that actually works - everything was subtly off and wrong, and untangling would have taken longer than rewriting it.
Is Copilot just bad, and I need to use Claude Code and/or Cursor?
I haven't used Claude Code much, so cannot really speak of it. But Copilot and Cursor tends to make me waste more time than I get out of it. Aider running locally with a mix-and-match of models depending on the problem (lots of DeepSeek Reasoner/Chat since it's so cheap), and Codex, are both miles ahead of at least Copilot and Cursor.
Also, most of these things seems to run with temperature>0.0, so doing multiple runs, even better with multiple different models, tend to give you better results. My own homegrow agent that runs Aider multiple times with a combination of models tend to give me a list of things to chose between, then I either straight up merge the best one, or iterate on the best one sometimes inspired by the others.
Even the basic chat UI is a structure built around a foundational model; the model itself has no capability to maintain a chat thread. The model takes context and outputs a response, every time.
For more complex processes, you need to carefully curate what context to give the model and when. There are many applications where you can say "oh, chatgpt can analyze your business data and tell you how to optimize different processes", but good luck actually doing that. That requires complex prompts and sequences of LLM calls (or other ML models), mixed with well-defined tools that enable the AI to return a useful result.
This forms the basis of AI engineering - which is different from developing AI models - and this is what most software engineers will be doing in the next 5-10 years. This isn't some kind of hype that will die down as soon as the money gets spent, a la crypto. People will create agents that automate many processes, even within software development itself. This kind of utility is a no-brainer for anyone running a business, and hits deeply in consumer markets as well. Much of what OpenAI is currently working on is building agents around their own models to break into consumer markets.
I recommend anyone interested in this to read this book: https://www.amazon.com/AI-Engineering-Building-Applications-...
The progresses we are seeing in agents are 99% due to new LLMs being semantically more powerful.
Any suggestion on where to start with point 1? (Also a SWE).
I've been asking this on every AI coding thread. Are there good youtube videos of ppl using AI on complex codebases. I see tons of build tic-tac-to in 5 minutes type videos but not on bigger established codebases.
So in that case I don’t see why not?
I could imagine that even those "ancient" techniques might some day make a comeback. They're far inferior to LLMs in terms of expressive power, but they also require literally orders of magnitude less memory and computation power. So when the hype dies down, interest in solutions that don't require millions in hardware cost or making your entire business dependent on what Sam Altman and Donald Trump had for breakfast might have a resurgence. Also, interestingly enough, LLMs could even help in this area: Most of those old techniques require an abundance of labeled training data, which was always hard to achieve in practice. However, LLMs are great at either labeling existing data or generating new synthetic data that those systems could train on.
there aren't 100x 'top shelf' ml engineers.
There aren't a lot of jobs self taught ml programmers like there are for self taught python programmers.
I see this a lot, but I think it's irrelevant. Even if this is a bubble, and even if (when?) it bursts, the underlying tech is not going anywhere. Just like the last dotcom bubble gave us FAANG+, so will this give us the next letters. Sure, agentsdotcom or flowsdotcom or ragdotcom might fail (likely IMO), but the stack is here to stay, and it's only gonna get better, cheaper, more integrated.
What is becoming increasingly clear, IMO, is that you have to spend some time with this. Prompting an LLM is like the old google-fu. You need to gain experience with it, to make the most out of it. Same with coding stacks. There are plenty of ways to use what's available now, as "tools". Play around, see what they can do for you now, see where it might lead. You don't need to buy into the hype, and some skepticism is warranted, but you shouldn't ignore the entire field either.
However not in the case of AI (agentic AI / LLM), because simply they already have a use case, and a valid one. Contextual query and document searching / knowledge digging will be there to stay, either in form of current agentic model or different one.
But as for my 2 cents, knowing machine learning has been valuable to me, but not anywhere near as valuable as knowing software dev. Machine learning problems are much more rare and often don’t have a high return on investment.
1) Established companies (meta/google/uber) with lots of data and who want MLEs to make 0.1% improvements because each of those is worth millions.
2) Startups mostly proxying OpenAI calls.
The first group is definitely not hype. Their core business relies on ML and they don’t need hype for that to be true.
For the second group, it depends on the business model. The fact that you can make an API call doesn’t mean anything. What matters is solving a customer problem.
I also (selfishly) believe a lot of the second group will hire folks to train faster and more personalized models once their business models are proven.
Between your two options, I’d lean toward continuing to build what you’re good at and using AI as a powerful tool, unless you genuinely feel pulled toward the internals and research side.
I’ve been lucky to build a fun career in IT, where the biggest threats used to be Y2K, the dot-com bubble, and predictions that mobile phones would kill off PCs. (Spoiler: PCs are still here, and so am I.)
The real question is: what are you passionate enough about to dive into with energy and persistence? That’s what will make the learning worth it. Everything else is noise in my opinion.
If I had to start over today, I'd definitely be in the same uncertain position, but I know I'd still just pick a direction and adapt to the challenges that come with it. That’s the nature of the field.
Definitely learn the fundamentals of how these AI tools work (like understanding how AI tools process context or what transformers actually do). But don’t feel like you need to dive head-first into gradient descent to be part of the future. Focus on building real-world solutions, where AI is a tool, not the objective. And if a cheese grater gets the job done, don’t get bogged down reverse-engineering its rotational torque curves. Just grate the cheese and keep cooking.
That’s my 2 cents, shredded, not sliced.
Or, if you believe there may be some merit to "AI is coming for your job" meme, but really don't want to do blue collar / skilled trades work, at least go in with the mindset of "the people who build, operate, and maintain the AI systems will probably stay employed at least a little bit longer than the people don't". And then figure out how to apply that to deciding between one or both of your (1) and (2) options. There may also be some white collar jobs that will be safe longer due to regulatory reasons or whatever. Maybe get your physician's assistant license or something?
And yes, I'm maybe playing "Devil's Advocate" here a little bit. But I will say that I don't consider the idea of a future where AI has meaningful impact on employment for tech professionals to be entirely out of the question, especially as we extend the timeline. Whatever you think of today's AI, consider that it's as bad right now as it will ever be. And ask what it will be like in 1 year. Or 3 years. Or 7 years. Or 10 years. And then try to work out what position you want to be in at those points in the timeline.
Its not IT where you can create value from thin air and thus grow the market and increase need for even more professionals.
As soon as a tiny percent goes into trades (bet tons of new people already doing this) the market will be oversaturated in a few years when they finish apprenticeships.
After that it will be harder to find a job than in IT with AI around the corner.
Look, I don't know if any of this is actually going to to come to pass or not. But it seems at least a little bit less like pure sci-fi now than it did a decade or two ago.
Anyway, if we play along with the thought experiment of asking "what happens to our society when a very large swathe of the human population is no longer needed to exchange their labor for wages?" it really leads one to wonder what kind of economic system(s) we'll have and if we'll find a way to avoid a straight up dystopian hellscape.
Call me optimistic or whatever, but isn't that the best case scenario? If having a full-time job is basically just for the 0.1% or whatever, then we must have figured out a different way of distributing goods and solving peoples needs, that doesn't involve "trading time for money" (a job), and that sounds like it can be a good thing, not "worst case scenario".
> Call me optimistic or whatever, but isn't that the best case scenario?
The gains of technology are mostly captured by those with capital, not those with labor. Look at wage growth over the last few decades as well as productivity growth to have confirmation.
There’s no reason to believe given the current trend that benefits will be evenly distributed to the 99-97% of wage earners.
Right, but in this hypothetical future where "basically nobody have a job", would we still live in such a system? If so, where do the money come from if people don't work for it?
If you're good at what you're doing right now and you enjoy it — why change? Some might argue that AI will eventually take your job, but I strongly doubt that.
If you're looking for something new because you are bored, go for it. I tried to wrap my head around the basics of LLMs and how they work under the hood. It’s not that complicated — I managed to understand it, wrote about it, shared it with others, and felt ready to go further in that direction. But the field moves fast. While I grasped the fundamentals, keeping up took a lot of effort. And as a self-taught “expert,” I’d never quite match an experienced data scientist.
So here I am — extensively using AI. It helps me work faster and has broadened my field of operation.
If you are considering whether the future will boost the demand to build AIs (i.e. for clients), we could say: probably so, given regained awareness. It may not be about LLMs - and it should not, at this stage (it can hit reputation - they can hardly be made reliable).
Follow the Classical Artificial Intelligence course, MIT 6.034, from Prof. Patrick Winston - as a first step.
For example, if we wanted to conduct an analysis with a new piece of software, it wasn't enough to run the software: we needed to be able to explain the theory behind it (basically, to be able to rewrite the tool).
From that standpoint, I think that even if you keep with #2, you might benefit from taking steps to gain the understanding from #1. It will help you understand the models' real advantages and disadvantages to help you decide how to incorporate them in #2.
Very wise advice! And the more complex systems are, the more this is truly needed.
But I believe that the value will come after the bubble is burst, and the companies which truly create value will survive, same as with webpages after the dot com bubble.
1/ There aren't many jobs in this space. There are still far more companies (and roles) that need 'full-stack development' than those focused on 'AI/LLM internals.' With low demand for AI internals and a high supply of talent—many people have earned data science certificates in AI hoping to land lucrative jobs at OpenAI, Anthropic, etc.—the bar for accessing these few roles is very high.
2/ The risk here is AI makes everyone good at full-stack. This means more competition for roles, less demand for roles (now 1 in-experienced engineer with AI, can output 1.5x the code an experience Senior engineer could do in 2020).
In the short/medium term, 2/ has the best risk/reward function. But 1/ is more future proof.
Another important question is where are you in your career? If you're 45 years old, I'd encourage you to switch into leadership roles for 2/. This wont be replaced by AI. If you're early in your career, it could make more sense to switch.
The data scientist roles have had a similar drift in my experience. They used to be "statistician who can code" or "developer who knows some stats", what we got is "clicks buttons in the Azure GUI".
Data Drift. Over the course of a few moNths the data deteriorates and the LLM ceases to function in a worthwhile manner.
Currently most LLMs are based upon the core preMise that people should not believe anythiNg. This is tokenized aBove everythiNg else. Then there are other erroneous tokenizations. Why these are not fully documented, people use these tools. You should know what you are getting.
Tokenization. Different words are tokenized to have a higher value than other words or configurations.
So, these are the dangers that everyone has ignored. It makes it an unethical tool because it is based upon someone's erroneous views. Honesty is the best policy. If it cant be honest, how can you trust it?
Why you can trust wikipedia but not AI? That's because the source inside it has been verified by many people. So if you're faced with a new page in Wikipedia that hasn't been verified much, you need to treat it same as AI, verifying it yourself by crosschecking it with other sources.
Most of my LLMs made lots of mistakes, but Codex with $200 subscription changed my workflow totally, and now I'm having 40 pull requests/day merged.
Treat LLMs as interns, increase your test coverage with them to the point that they can't ruin your codebase and get really good at reviewing code and splitting tasks up to smaller digestible ones, and promote yourself as team leader.
I gave it an honest chance, but couldn’t get a single PR out of it. It would just continue to make mistakes. And even when it got close I asked it a minor tweak and it made things worse. I iterated 7 times on the same small problem.
Currently in the stage of evaluating Codex (mostly comparing it to Aider and my own homegrown LLM setup). I'm able to get changes out of it, that mostly make sense, but you really need to take whatever personal guidelines you have for coding and "encode" them into the AGENTS.md, and really focus on asking the right question/request changes in the right way.
Without AGENTS.md, it seems to go of the wrong end really quickly, and end up with subpar code. But with a little bit of guidance, I do get some results at least. This is the current AGENTS.md I'm using for some smaller projects: https://gist.github.com/victorb/1fe62fe7b80a64fc5b446f82d313...
With that said, it does get mislead sometimes, and the UX isn't great for the web version. It's really slow, you can't customize the environment, the UI seems to load data in a really weird way leading to slowdowns and high latencies, and overall it's just cumbersome. My homegrown version is way faster for the iterations, + has stateful PRs it can iterate on and receive line comment feedback on, but the local models I'm using are obviously worse than the OpenAI ones, so I'd still say Codex is probably overall better, sadly.
https://github.com/adamritter/pageql
This will be interesting to look at, thanks for sharing!
If you want to switch fields and work on LLM internals/fundamentals in a meaningful way, you'd probably want to become a research scientist at one of the big companies. This is pretty tough because that's almost always gated by a PhD requirement.
So I would learn things that are either fun for you to learn or things that you can directly apply.
For AI this means you probably should learn about it if you are really interested and enjoy going through build-your-own-NN tutorials or if you have good chances of switching to a role where you can use your new skills.
Edit: Basically investing anything (also time) is risky. So invest in things that directly pay off or bring you joy regardless of the outcome.
Do you like science? Then dive deep into LLMs. Be ready for science, though. It involves shooting a thousand shots in the dark until you discover something new. That's how science gets done. I respect it, but I personally don't love doing it.
Do you like engineering? That's when you approach a problem and can reason about a number of potential solutions, weigh the pros and cons of each, and pick one with the appropriate trade-offs. It's pretty different from science.
You don’t need much expertise in NNs to still be able to get huge value out of them today
[1]https://news.ycombinator.com/item?id=44079296
Do you need to understand how the circular saw and drill are made?
That doesn't mean knowing every single bit there is to know about it, but a basic understanding will go a long way in correctly using it.
You don't need to deep dive into the maths. You'll need to understand the limitations, the performance bottlenecks, etc. RAGs, Vector DBs, etc
Just because a field is popular, doesn't mean you should switch to going deep into it. But that doesn't mean you shouldn't use it - it costs a few dollars to try it out and see whether it fits your workflow. If it does, this is great and you can be more productive and focus on stuff that you can solve and LLM can't, but if it doesn't, that is fine too.
It is worth getting use to that mind-set, and then use LLMs as a tool (they are likely here to stay, because big tech have started to integrate features based on them everywhere, for better or worse). So this is your option (2.). Personally, I prefer software I use NOT to be smart, but to be 100% deterministic.
But already my favorite LaTeX authoring environment (Overleaf) has a button pop up called "fix this" that auto-resolves syntax errors, many of which overwhelm my doctoral students that no longer read books end-to-end (here, to learn LaTeX).
Gradually, you may dive deeper into the "how", driven by either need or curiosity, so eventually you will probably have done both (2.) and (1.). - in the same way that you will have first learned SQL before learning how replication, transactions, data buffers and caches, or query optimizers are implemented.
RAG could largely be replaced with tool use to a search engine. You could keep some of the approach around indexing/embeddings/semantic search, but it just becomes another tool call to a separate system.
How would you feel about becoming an expert in something that is so in flux and might disappear? That might help give you your answer.
That said, there's a lot of comparatively low hanging fruit in LLM adjacent areas atm.
Isn't that true for almost every subject within computers though, except more generalized concepts like design/architecture, problem solving and more abstract skills? Say you learn whatever popular "Compile-to-JS" language (probably TypeScript today) or Kubernetes, there is always a risk it'll fade in popularity until not many people use it.
I'm not saying it's a problem, as said by someone who favors a language people constantly claim is "dying" or "disappearing" (Clojure), but more that this isn't exclusive to the LLM/ML space, it just seems to happen slightly faster in that ecosystem.
So instead, embrace change, go with what feels right and learn whatever seems interesting to you, some things stick around, others don't (like Coffeescript), hopefully you'll learn something even if it doesn't stick around.
Worst case, you’ll be a more interesting, well-rounded, and curious person with a broad set of programming skills.
Work for companies (as a consultant?) to help them implement LLMs/AI into their traditional processes?
Just learn this and its prerequisites
Are you asking to get a PhD or use them as tools?
Learning to work with the outputs of them (which is what I do) can be much more rewarding. Building apps based around generative outputs, working with latency and token costs and rate limits as constraints, writing evals as much as you write tests, RAG systems and embeddings etc.
Over like five years we've been promised a revolution that has yet to appear and is still lighting billions of dollars on fire. Don't bet on it materialising tomorrow.
If you need comforting, go read Merleau-Ponty and Heidegger, perhaps condensed down as Hubert Dreyfus.
We’re in a funny moment. Right now, AI tech is so powerful and capable that people are categorically underestimating their value and systematically underusing them — whatever the hype is signalling. If the tech froze right now, there’s decades of applications to mine.
Lots of great products being built on that thesis. The strategy is: unlock more of their present capability, harness that for a wider audience’s use case.
In that way you do both — leverage the tools, and in becoming an expert user, you can find yourself a vendor of very valuable guidance — and a builder of desperately sought-after products.
It’s up to you where your morals lay and how important money is compared to those morals but it seems like AI is here to stay.