Tokasaurus: An LLM inference engine for high-throughput workloads (scalingintelligence.stanford.edu)

Software security has been running off large-scale automation for over a decade. LLMs might or might not be a step change in that automation (I'm optimistic but uncertain), but, unlike in conventional software development, the standard arguments about craft and thoughtfulness aren't operative here. If there was an argument to be had, it would have been had around the time Google stood up the mega fuzzing farms.

A fun thing to keep in mind about software security is that it's premised on the existence of a determined and amoral adversary. In the long-long ago, Richard Stallman's host at MIT had no password; anybody could log into it. It was a statement about the ethics of locking down computing resources. That's approximately the position any security practitioner would be taking if they attempted to moralize against LLM-assisted offensive computing.

eyberg · 11h ago

I kinda see the different side of the coin.

"a determined and amoral adversary" - I'd kinda disagree with this (the amoral adversary part being necessary). If you crawl through the vast data breach notification lists that many states are starting to keep - MA, ME, etc. there are so many of them (like literally daily banks, hospitals, etc. are having to report "data breaches" that never ever make the news) - not all of them are happening cause of ransomware. Sometimes it's just someone accidentally not locking a bucket down or not putting proper authorization on a path that should have it. It gets found/fixed but they still have to notify the state. However, if someone doesn't know what they are looking at, or it's a program so it really has no clue what it's looking at and just sees a bunch of data - there's no malicious intent but that doesn't mean that bad things can't happen because that data has now leaked out.

Guess what a lot of these LLMs are training on?

So while Andrey's software is finding all sorts of interesting stuff there's a bunch of crap being generated inadvertently that is just bad.

bradyriddle · 12h ago

Say more about these mega fuzzing farms. I haven't heard anything about this.

navanchauhan · 11h ago

Something like https://security.googleblog.com/2011/08/fuzzing-at-scale.htm... ?

cibyr · 10h ago

ClusterFuzz: https://google.github.io/clusterfuzz/

tptacek · 11h ago

There are fields, endless fields, where kernel zero days are no longer born. They are grown.

AStonesThrow · 11h ago

  rms@gnu.ai.mit.edu

It was actually the A.I. Lab at M.I.T. and they already had their own dedicated subdomain for it. This had to have been around 1990-91. And IIRC, the actual admins made a valiant effort to keep all the shell users away from "root" privileges, so it wasn't a total dumpster fire and the system stayed alive, mostly

https://en.wikipedia.org/wiki/MIT_Computer_Science_and_Artif...

tptacek · 11h ago

I mean, I remember, in 1994, being on those systems. But it meant nothing. Anybody could be. There wasn't even a glimmer of interestingness about it. It was like "ls"'ing around an anonymous FTP server.

AStonesThrow · 11h ago

Hey, I cannot even begin to describe the thrill I got when I first found my way to the AF.MIL anon-ftp server! It was probably sparsely populated with public domain software and a couple boring games, but it felt like I'd just walked in the front gate of Miramar and witnessed the Blue Angels doing barrel rolls.

Sure, it was basically "a poster on the wall" for the US Air Force, and the US Army guy on Usenet shared nothing about his actual Ballistics Research Labs experiments, but for a college freshman kid, I'd never been on a way k00ler bboard, doodz!!1

anthk · 11h ago

ITS had no root

vouaobrasil · 12h ago

> victory will belong to the savvy blackhat hacker who uses AI to generate code at scale

This is just it: AI, while providing some efficiency gains for the average user, will become simply too powerful. Imagine a superpower that allows you to move objects with your mind. That could be a very nice thing to have for many to have, because you could probably help people with it. That's the attitude many hacker-types take. The problem is, it allows people to also kill instantly, which means that telekinesis would just be too powerful to juxtapose against our animal instincts.

AI is just too powerful – and if more people took a serious stand against it, it might actually be shut down.

rco8786 · 12h ago

Is it even possible to shut it down?

vouaobrasil · 12h ago

Of course it is. If enough people were truly enraged by it, if some leader were to rile up the mob enough, it could be shut down. Revolts have occurred in other parts of the world, and things are getting sufficiently bad that a sizable enough revolt could shut AI down. All we need is a sufficient number of people who are angry enough at AI.

sgjohnson · 12h ago

Good luck shutting down the LLM running on my MacBook.

The Pandora’s Box is open. It’s over.

blibble · 11h ago

a software update could easily cripple its ability to run on your local machine

unless you plan to never update again

diggan · 11h ago

> a software update could easily cripple its ability to run on your local machine

A software update collaborated on by Microsoft, Apple + countless of volunteer groups managing various other distributions?

The cat really is out of the bag. You could probably make it a death penalty in the whole world and some people would still use it secretly.

Once things like this run on consumer hardware, I think it's already too late to pull it down fully. You could regulate it though and probably have a better chance of limiting the damages, not sure an outright ban could even have the effect you want with a ban.

blibble · 11h ago

the nvidia/AMD/apple chips all require proprietary firmware blobs, it can be enforced in there

yes you won't get people that won't ever update, but you'll get the overwhelming majority

and the hardware the never-updaters use will eventually fail and won't be able to be replaced

also: ban the release of new "open" models, they will slowly become out of date and useless

combine these, and the problem will solve itself over time

diggan · 11h ago

> they will slowly become out of date and useless

Models released today are already useful for a bunch of stuff, maybe over the course of 100 year they could be considered "out of date", but they don't exactly bitrot by themselves because they sit on a disk, not sure why'd they suddenly "expire" or whatever you try to hint at.

And even over the course of 100 year, people will continue the machine learning science, regardless if it's legal or not, the potential benefits (for a select few) seems to be too good for people to ignore, which is why the current bubble is happening in the first place.

blibble · 11h ago

your hardware won't last 100 years

> And even over the course of 100 year, people will continue the machine learning science

the weak point is big tech: without their massive spending the entire ecosystem will collapse

so that's what we target, politically, legally, technologically and regulatory

we (humanity) only need to succeed in one of these domains once, then their business model becomes nonviable

once you cut off the snake's head, the body will die

the boosters in search of a quick buck will then move onto the next thing (probably "quantum")

diggan · 10h ago

I think you over-estimate how difficult it is to get "most of the world" to agree to anything, and under-estimate how far people are willing to go to make anything survive even when lots of people want that thing to die.

blibble · 10h ago

> I think you over-estimate how difficult it is to get "most of the world" to agree to anything

agreement isn't needed

its success sows the seeds of its own destruction, if it starts eating the middle class: politicians in each and every country that want to remain electable will move towards this position independently of each other

> and under-estimate how far people are willing to go to make anything survive even when lots of people want that thing to die.

the structural funding is such that all you need to do is chop off the funding from big tech

the nerd in their basement with their 2023 macbook is irrelevant

vouaobrasil · 11h ago

Plenty of past civilizations have thought they were invulnerable. In fact, most entities with power think that they can never be taken down. But countless empires in the past have fallen, and countless powerful people have lost their wealth and power overnight.

svachalek · 11h ago

There's a big difference between a civilization being taken down, and civilization being taken down.

rglover · 12h ago

It's just software running on a server...this isn't a Johnny Depp movie [1]. Just flip the power switch on the racks.

[1] https://www.youtube.com/watch?v=0jg3mSf561w

Animats · 11h ago

"Skynet was software; in cyberspace. There was no system core; it could not be shut down"

Yes. Look at how much trouble we have now with distributed denial of service attacks.

Go re-read "Daemon" and "Freedom™", by Daniel Suarez (2006). That AI is dumber than what we have now.

vouaobrasil · 11h ago

On the other hand, if those fighting Skynet were asked to trade Skynet for the AI we have now, they would take it as their new enemy in a heartbeat.

hluska · 11h ago

Rather, it’s many different types of software running on many different systems around the world, each funded by a different party with its own motives. This is no movie…

vouaobrasil · 11h ago

True, but the system only exists because it is currently economically viable. A mass taboo against AI would change that. And many people outside of tech already dislike AI a lot, so it's not inconceivable that this dislike could be fuelled into a worldwide taboo.

diggan · 11h ago

> True, but the system only exists because it is currently economically viable.

The "system" isn't a thing, but more like running apps, some run on servers, other consumer hardware. And the parts that run on consumer hardware will be around even if 99% of the current hyped up ecosystem dies overnight, people won't suddenly stop trying to run these things locally.

rglover · 11h ago

And every single one has a power switch.

I get the general "too many variables" argument, but the idea that humans have no means of stopping any of these apps/systems/algorithms/etc if they get "out of control" (a farce in itself as it's a chat bot) is ridiculous.

It's very interesting to see how badly people want to be living in and being an active participant in a sci-fi flick. I think that's far more concerning than the AI itself.

vouaobrasil · 8h ago

Hmm, good point. Also, when COVID struck, although it took some time, everyone collectively participated in staying home (more or less, I know some people didn't but the participating was vast). We can do the same if we choose.

byt3bl33d3r · 11h ago

I've written tailored offensive security tools and malware for Red Teams for around a decade and now work in the AI space.

The argument that LLMs will enable "super powered" malware and that existing security solutions won't be able to keep up, is completely overblown. I see 0 evidence of this being possible with the current incarnation of "AI" or LLMs.

"Vide coded" malware will be easier to detect if the people creating it don't understand what the code is actually doing and will result in incredible amount of OpSec fails when the malware actually hits the target systems.

I do agree that "vide coding" will accelerate malware development and generally increase the amount of attacks to orgs. However if you're already applying bog-standard security practices like defense in depth, you shouldn't be concerned about this. If anything, you might want to start thinking about SOC automations in order to reduce alert fatigue.

Stay far away from anyone trying to sell you products to defend against "AI enabled malware". As of right now it's 100% snake oil.

Also, this is probably one of the cringiest articles on the subject I've ever read and is only meant to spread FUD.

I do find the banner video extremely entertaining however.

DanMcInerney · 11h ago

I too write automated offensive tooling. We actually wrote a project, vulnhuntr, that found the first autonomously-discovered 0day using AI. Feed it a GitHub repo and it tracks down user input from source to sink and analyzes for web-based vulnerabilities. Agree this article is incredibly cringy and standard best practices in network and development security will use the same AI efficiency gains to keep up (more or less).

What bothers me the most about this article is that the tools that attackers use to do stuff like find 0days in code are the same tools that defenders can use to find the 0day first and fix it. It's not like offensive tooling is being developed in a vacuum and the world is ending as "armies of script kiddies" will suddenly drain every bank account in the world. Automated defense and code analysis is improving at a similar rate as automated offense.

In this awful article's defense though, I would argue that red team will always have an advantage over blue team because blue team is by definition reactionary. So as tech continues it's exponential advancements, the advantage gap for the top 1% red teamers is likely to scale accordingly.

tptacek · 11h ago

For the record I buy your argument about "vibe-coded malware"; this cycle of hype has been running since 1995 and Nowhere Man's "Virus Creation Lab". I am however fixated on the impact LLMs will have on vulnerability research, and what that will do to the ecosystem.

byt3bl33d3r · 9h ago

100% agree on the impact of it on research. It's pretty obvious that it'll accelerate 0day discovery but standard defense in depth strategies prepare you for 0day vulns against your org.

It will be extremely interesting to see how vulnerability discovery evolves with LLMs but the whole "sky is falling hide your kids" hype cycle is ludicrous.

parliament32 · 11h ago

We are no closer to this than "vibe-businessing", where you vibe your way into a profit-generating business powered by nothing than AI agents.

tptacek · 11h ago

(Looks around)

You know, there are some pretty crazy run rates out there.

Animats · 11h ago

The AI nightmare after that is "vibe capitalism". You put in a pitch deck and an operating business comes out.

Somebody should pitch that to YC.

blibble · 11h ago

this already exists, it's called a venture capital fund

jdefr89 · 11h ago

Alright folks.. To qualify myself. I am a vulnerability Researcher @ MIT. My day to day research concerns embedded hardware/software security. Some of my current/past endeavors involve AI/ML integration and understanding just how useful it actually is for finding/exploiting vulnerabilities. Just last week my lab hosted a conference that included MIT folks and the outsiders we invite. One talk was on the current state of AI/LLM. To keep things short, this article is sensationalized and overstates the utility of AI/ML on finding actual novel vulnerabilities. As it currently stands, LLMs cannot even reliably find bugs other less sophisticated tools could have found in much less time. Binary Exploitation is a great place for illustrating the wall you’ll hit using LLMs hoping for a 0day. While LLMs can help with things like setting up fuzzers or maybe giving you a place to start manual analysis, their utility kind of stops there. They cannot reliably catch memory corruption bugs that a basic fuzzers or sanitizers could have found within seconds. This makes sense for that class of bugs. LLMs are fuzzy logic and these issues aren’t reliably found with that paradigm. That’s the whole reason we have fuzzers; they find subtle bugs worth triaging. You’ve seen how well LLMs count, it’s no surprise they might miss many of the same things a humans would but fuzzers wouldn’t (think UaF, OOB, etc). All the other tools you see written for script kiddies yield the same amount of false positives they could have gotten with other tools that already exist.. I can go on and on but I am on shuttle, typing on small phone. TLDR: Article is trying to say LLMs are super hackers already and that’s simply false. They definitely have allure for script kiddies. In the future this might change. LLMs time saving aspects are definitely worth checking out for static binary analysis. Binary Ninja with Sidekick saves a lot of time! But again. You still need to double check important things!

tptacek · 11h ago

I'm a vuln researcher too, and we just had an article here about another vuln researcher using o3 to find a zero-day remote Linux kernel vulnerability. And not in an especially human-directed way: they literally set up 100 runs of o3, using the 'simonw `llm` tool, and sifted through the results.

I'm having trouble reconciling what you wrote here with that result. Also with my own experiences, not necessarily of finding kernel vulnerabilities (I haven't had any need to do that for the last couple years), but of rapidly comprehending and analyzing kernel code (which I do need to do), and realizing how potent that ability would have been on projects 10 years ago.

I think you're wrong about this.

jdefr89 · 10h ago

I might be. Deepsleep also sort of found a bug, but you need to ask yourself… is it doing it better than tools we already have? Could a fuzzer have found that bug in less time? How far along did it really need to be pushed and also.. I have no doubts it probably trained on certain types of bugs for certain specific code bases.. Did they test its ability to find the same bug after applying a couple transforms that trip up the LLM? Can you link me to this article about o3? I have my doubts. I’d love to see the working exploit…

Also if you throw these models at enough code bases, they will probably get lucky a couple times.. So far every claim I have seen didn’t stand up to rigorous scrutiny. People find one bug then inflate their findings and write articles that would make you think they are far more affective than reality and I am tired of this hype.

CURL had to stop accepting bounties after it found nearly all of em were just AI generated nonsense…

Also I stated that they indeed provide very large gains in certain areas. Like writing a fuzz harness and reversing binaries. I am not saying they have absolutely no utility I am simply tired of grifters attempting to inflate their findings for clout. Shit has gotten out of control.

tptacek · 9h ago

But that's exactly what people were saying about fuzzer farms in the mid-2000s, in the belief that artisanal audits would always be the dominant means of uncovering bugs. The truth was somewhere in between (it's still humans, but working at a higher layer of abstraction than they were before) but the fuzzer people were hugely right.

If you can reliably get x% lucky finding vulnerabilities for Y$ cost, then you simply scale that up to find more vulnerabilities.

jdefr89 · 9h ago

I don’t recall anyone saying anything of the sort back then about fuzzing? Back then you could run the most basic fuzzer and find tons of bugs! Where did you see people complaining about fuzzers??

tptacek · 8h ago

If you go digging through the blogosphere of the time you'll turn it up. I feel like this is ~2006?

jdefr89 · 8h ago

Bro in 2006 there wasn't any blogosphere. You had IRC.. I can’t find anything of the sort and do you have a link to this discovery that you say was made via LLM?

tptacek · 7h ago

Bro, I've been a practitioner since 1996 and ran a fucking security blog in 2006.

Tokasaurus: An LLM inference engine for high-throughput workloads (scalingintelligence.stanford.edu)

The impossible predicament of the death newts (crookedtimber.org)

Show HN: Claude Composer (github.com)

How we’re responding to The NYT’s data demands in order to protect user privacy (openai.com)

Test Postgres in Python Like SQLite (github.com)

What a developer needs to know about SCIM (tesseral.com)

Show HN: Ask-human-mcp – zero-config human-in-loop hatch to stop hallucinations (masonyarbrough.com)

Open Source Distilling (opensourcedistilling.com)

Self-hosting your own media considered harmful according to YouTube (jeffgeerling.com)

APL Interpreter – An implementation of APL, written in Haskell (2024) (scharenbroch.dev)

Defending adverbs exuberantly if conditionally (countercraft.substack.com)

Air Lab – A portable and open air quality measuring device (networkedartifacts.com)

Seven Days at the Bin Store (defector.com)

SkyRoof: New Ham Satellite Tracking and SDR Receiver Software (rtl-sdr.com)

Machine Learning: The Native Language of Biology (decodingbiology.substack.com)

X changes its terms to bar training of AI models using its content (techcrunch.com)

Show HN: Lambduck, a Functional Programming Brainfuck (imjakingit.github.io)

A proposal to restrict sites from accessing a users’ local network (github.com)

I made a search engine worse than Elasticsearch (2024) (softwaredoug.com)

I do not remember my life and it's fine (aethermug.com)

The Universal Tech Tree (asteriskmag.com)

Converge (YC S23) Well-capitalized New York startup seeks product developers (runconverge.com)

Show HN: iOS Screen Time from a REST API (thescreentimenetwork.com)

Programming language Dino and its implementation (github.com)

Building an AI Server on a Budget (informationga.in)

Show HN: ClickStack – Open-source Datadog alternative by ClickHouse and HyperDX (github.com)

Eleven v3 (elevenlabs.io)

How Common Is Multiple Invention? (construction-physics.com)

Autonomous drone defeats human champions in racing first (tudelft.nl)

Show HN: Container Use for Agents (github.com)

Show HN: String Flux – Simplify everyday string transformations for developers (stringflux.io)

Apple Notes Will Gain Markdown Export at WWDC, and, I Have Thoughts (daringfireball.net)

parrot.live (github.com)

LLMs and Elixir: Windfall or Deathblow? (zachdaniel.dev)

Phptop: Simple PHP ressource profiler, safe and useful for production sites (github.com)

From tokens to thoughts: How LLMs and humans trade compression for meaning (arxiv.org)

Twitter's new encrypted DMs aren't better than the old ones (mjg59.dreamwidth.org)

End of an Era: Landsat 7 Decommissioned After 25 Years of Earth Observation (usgs.gov)

Show HN: I made a 3D SVG Renderer that projects textures without rasterization (seve.blog)

Anthropic co-founder on cutting access to Windsurf (techcrunch.com)

Prompt engineering playbook for programmers (addyo.substack.com)

Data centers are building their own gas power plants in Texas (texastribune.org)

Rare black iceberg spotted off Labrador coast could be 100k years old (cbc.ca)

The iPhone 15 Pro’s Depth Maps (tech.marksblogg.com)

Cursor 1.0 (cursor.com)

A Spiral Structure in the Inner Oort Cloud (iopscience.iop.org)

Aurora, a foundation model for the Earth system (nytimes.com)

Understanding the PURL Specification (Package URL) (fossa.com)

FFmpeg merges WebRTC support (git.ffmpeg.org)

When memory was measured in kilobytes: The art of efficient vision (softwareheritage.org)

The Rise of 'Vibe Hacking' Is the Next AI Nightmare

Comments (47)