Once upon a time around 2001 or so I used to have a static line at home and host some stuff on my home linux box. A windows NT update had meant a lot of them had enabled this optimistic encryption thing where windows boxes would try to connect to a certain port and negotiate an s/wan before doing TCP traffic. I was used to seeing this traffic a lot on my firewall so no big deal. However there was one machine in particular that was really obnoxious. It would try to connect every few seconds and would just not quit.
I tried to contact the admin of the box (yeah that’s what people used to do) and got nowhere. Eventually I sent a message saying “hey I see your machine trying to connect every few seconds on port <whatever it is>. I’m just sending a heads up that we’re starting a new service on that port and I want to make sure it doesn’t cause you any problems.”
Of course I didn’t hear back. Then I set up a server on that port that basically read from /dev/urandom, set TCP_NODELAY and a few other flags and pushed out random gibberish as fast as possible. I figured the clients of this service might not want their strings of randomness to be null-terminated so I thoughtfully removed any nulls that might otherwise naturally occur. The misconfigured NT box connected, drank 5 seconds or so worth of randomness, then disappeared. Then 5 minutes later, reappeared, connected, took its buffer overflow medicine and disappeared again. And this pattern then continued for a few weeks until the box disappeared from the internet completely.
I like to imagine that some admin was just sitting there scratching his head wondering why his NT box kept rebooting.
kqr · 11h ago
The lesson for any programmers reading this is to always set an upper limit for how much data you accept from someone else. Every request should have both a timeout and a limit on the amounts of data it will consume.
keitmo · 3h ago
As a former boss used to say: "Unlimited is a bad idea."
eru · 11h ago
That doesn't necessarily need to be in the request itself.
You can also limit the wider process or system your request is part of.
kqr · 10h ago
While that is true, I recommend on the request anyway, because it makes it abundantly clear to the programmer that requests can fail, and failure needs to be handled somehow – even if it's by killing and restarting the process.
GTP · 7h ago
I second this: depending on the context, there might be a more graceful way of handling a response that's too long then crashing the process.
lazide · 6h ago
Though the issue with ‘too many byte’ limits is that this tends to cause outages later then time has passed and now whatever the common size was is now ‘tiny’, like if you’re dealing with images, etc.
Time limits tend to also defacto limit size, if bandwidth is somewhat constrained.
kqr · 3h ago
Deliberately denying service in one user flow because technology has evolved is much better than accidentally denying service to everyone because some part of the system misbehaved.
Timeouts and size limits are trivial to update as legitimate need is discovered.
lazide · 2h ago
Oh man, I wish I could share some outage postmortems with you.
Practically speaking, putting an arbitrary size limit
somewhere is like putting yet-another-ssl-cert-that-needs-to-be-renewed in some critical system. It will eventually cause an outage you aren’t expecting.
Will there be a plausible someone to blame? Of course. Realistically, it was also inevitable someone would forget and run right into it.
Time limits tend to not have this issue, for various reasons.
GTP · 2h ago
But not putting the limits, leaves the door open to a different class of outages in the form of buffer overflows, that additionally can also pose a security risk as could be exploitable by an attacker.
maybe this issue would be better solved at the protocol level, but in the meantime size limit it is.
lazide · 1h ago
Nah, just OOM. Yes, there does need to be a limit somewhere - it just doesn’t need to be arbitrary, but based on some processing limit, and ideally will adapt as say memory footprint gets larger.
guappa · 7h ago
Then you kill your service which might also be serving legitimate users.
mkwarman · 12h ago
I enjoyed reading this, thank you for sharing. When you say you tried to contact the admin of the box and that this was common back then, how would you typically find the contact info for an arbitrary client's admin?
cobbaut · 9h ago
Back then things like postmaster@theirdomain and webmaster@theirdomain were read by actual people. Also the whois command often worked.
dspearson · 9h ago
I work for one of the largest Swiss ISPs, and these mailboxes are still to this day read by actual people (me included), so it's sometimes worthwhile even today.
NetOpWibby · 8h ago
I setup a new mail server with Stalwart and have been getting automated mails to my postmaster address (security treat results mostly).
Pretty neat.
kqr · 11h ago
You can also find out who owns a general group of IP addresses, and at the time they would often assist you in further pinpointing who is responsible for a particular address.
DocTomoe · 11h ago
tech-c / abuse addresses were commonly available on whois.
ge96 · 1h ago
tangent
I had a lazy fix for a down detection on my RPi server at home, it was pinging a domain I owned and if it couldn't hit that assumed it wasn't connected to a network/rebooted itself. I let the domain lapse and this RPi kept going down around 5 minutes... thought it was a power fault, then I remembered about that CRON job.
mjmsmith · 22m ago
Around the same time, or maybe even earlier, some random company sent me a junk fax every Friday. Multiple polite voicemails to their office number were ignored, so I made a 100-page PDF where every page was a large black rectangle, and used one of the new-fangled email-to-fax gateways to send it to them. Within the hour, I got an irate call. The faxes stopped.
zerr · 4h ago
Didn't get why that WinNT box was connecting to your box. Due to some misconfigured Windows update procedure?
gigatexal · 11h ago
That’s awesome! Thank you for sharing.
layer8 · 20h ago
Back when I was a stupid kid, I once did
ln -s /dev/zero index.html
on my home page as a joke. Browsers at the time didn’t like that, they basically froze, sometimes taking the client system down with them.
Later on, browsers started to check for actual content I think, and would abort such requests.
bobmcnamara · 16h ago
I made a 64kx64k JPEG once by feeding the encoder the same line of macro blocks until it produce the entire image.
Years later I was finally able to open it.
opan · 16h ago
I had a ton of trouble opening a 10MB or so png a few weeks back. It was stitched together screenshots forming a map of some areas in a game, so it was quite large. Some stuff refused to open it at all as if the file was invalid, some would hang for minutes, some opened blurry. My first semi-success was Fossify Gallery on my phone from F-Droid. If I let it chug a bit, it'd show a blurry image, a while longer it'd focus. Then I'd try to zoom or pan and it'd blur for ages again. I guess it was aggressively lazy-loading. What worked in the end was GIMP. I had the thought that the image was probably made in an editor, so surely an editor could open it. The catch is that it took like 8GB of RAM, but then I could see clearly, zoom, and pan all I wanted. It made me wonder why there's not an image viewer that's just the viewer part of GIMP or something.
Among things that didn't work were qutebrowser, icecat, nsxiv, feh, imv, mpv. I did worry at first the file was corrupt, I was redownloading it, comparing hashes with a friend, etc. Makes for an interesting benchmark, I guess.
I'd say just curl/wget it, don't expect it to load in a browser.
Scaevolus · 16h ago
That's a 36,000x20,000 PNG, 720 megapixels. Many decoders explicitly limit the maximum image area they'll handle, under the reasonable assumption that it will exceed available RAM and take too long, and assume the file was crafted maliciously or by mistake.
swiftcoder · 52m ago
> don't expect it to load in a browser
Takes a few seconds, but otherwise seems pretty ok in desktop Safari. Preview.app also handles it fine (albeit does allocate an extra ~1-2GB of RAM)
lgeek · 14h ago
On Firefox on Android on my pretty old phone, a blurry preview rendered in about 10 seconds, and it was fully rendered in 20 something seconds. Smooth panning and zooming the entire time
connicpu · 10h ago
Firefox on a Samsung S23 Ultra did it a few seconds faster but otherwise the same experience
virtue3 · 15h ago
I use honey view for reading comics etc. It can handle this.
Old school acdsee would have been fine too.
I think it's all the pixel processing on the modern image viewers (or they're just using system web views that isn't 100% just a straight render).
I suspect that the more native renderers are doing some extra magic here. Or just being significantly more OK with using up all your ram.
jsnider3 · 33m ago
I get a Your connection was interrupted on Chrome.
Meneth · 4h ago
On my Waterfox 6.5.6, it opened but remained blurry when zoomed in.
MS Paint refused to open it.
The GIMP v2.99.18 crashed and took my display driver with it.
Windows 10 Photo Viewer surprisingly managed to open it and keep it sharp when zoomed in.
The GIMP v3.0.2 (latest version at the time of writing) crashed.
Moosdijk · 13h ago
It loads in about 5 seconds on an iPhone 12 using safari.
It also pans and zooms swiftly
avianlyric · 6h ago
Same, right up until I zoomed in and waited for Safari to produce a higher resolution render.
Partially zoomed in was fine, but zooming to maximum fidelity resulted in the tab crashing (it was completely responsive until the crash). Looks like Safari does some pretty smart progressive rendering, but forcing it to render the image at full resolution (by zooming in) causes the render to get OOMed or similar.
close04 · 8h ago
How strange, took at least 30s to load on my iPhone 12 Pro Max with Safari but it was smooth to pan and zoom after. Which is way better than my 16 core 64GB RAM Windows machine where both Chrome and Edge gave up very quickly, with a "broken thumbnail" icon.
GTP · 6h ago
Probably because they're based on the same engine.
close04 · 2h ago
The strangeness was that 2 iPhones from the same generation would exhibit such different performance behaviors, and in parallel the irony that a desktop browser (engine irrelevant) on a device with cutting edge performance can't do what a phone does.
promiseofbeans · 12h ago
Firefox on a mid-tier Samsung and a cheapo data connection (4G) took avout 30s to load. I could pan, but it limited me from zooming much, and the little I could zoom in looked quite blury.
bugfix · 15h ago
IrfanView was able to load it in about 8 seconds (Ryzen 7 5800x) using 2.8GB of RAM, but zooming/panning is quite slow (~500ms per action)
hdjrudni · 13h ago
IrfanView on my PC is very fast. Zoomed to 100% I can pan around no problem. Is it using CPU or GPU? I've got an 11900K CPU and RTX 3090.
ChoGGi · 3h ago
There's fast and slow resample viewing options in Irfanview, he may have slow turned on for higher quality.
beeslol · 15h ago
For what it's worth, this loaded (slowly) in Firefox on Windows for me (but zooming was blurry), and the default Photos viewer opened it no problem with smooth zooming and panning.
quickaccount · 16h ago
Safari on my MacBook Air opened it fine, though it took about four seconds. Zooming works fine as well. It does take ~3GB of memory according to Activity Monitor.
jaeckel · 11h ago
ImgurViewer from fdroid on an FP5 opened it blurry after around 5s and 5s later it was rendered completely.
Pan&zoom works instantly with a blurry preview and then takes another 5-10s to render completely.
spockz · 10h ago
Loading this on my iPhone on 1gbit took about 5s and I can easily pan and zoom. A desktop should handle it beautifully.
sixtyj · 10h ago
PDF files with included vector-based layers, e.g. plans or maps of large area, are also quite difficult to render/open.
jve · 3h ago
Just today collegue was looking at some air traffic permit map within PDF that was like 12MB or something around that. Complained about Adobe Reader changing something so he cannot pan/zoom no more.
I suggested to try the HN beloved Sumatra PDF. Ugh, it couldn't cope with it normally. Chrome did it better coped better.
radeeyate · 14h ago
Interestingly enough, it loads in about 5 seconds on my Pixel 6a.
arc-in-space · 9h ago
Oh hey it's the thing that ruins an otherwise okay rhythm game.
MaysonL · 14h ago
It loaded after 10-15 seconds on myiPad Pro M1, although it did start reloading after I looked around in it.
IamDaedalus · 8h ago
on mobile Brave just displayed it as the placeholder broken link image but in Firefox it loaded in about 10s
glial · 15h ago
It loads in about 10 seconds in Safari on an M1 Air. I think I am spoiled.
ninalanyon · 9h ago
Opens fine in Firefox 138.
DiggyJohnson · 10h ago
Safari on iPhone did a good job with it actually lol
ack_complete · 15h ago
I once encoded an entire TV OP into a multi-megabyte animated cursor (.ani) file.
Surprisingly, Windows 95 didn't die trying to load it, but quite a lot of operations in the system took noticeably longer than they normally did.
M95D · 9h ago
I wonder if I could create a 500TB html file with proper headers on a squashfs, an endless <div><div><div>... with no closing tags, and if I could instruct the server to not report file size before download.
Any ideeas?
Ugohcet · 7h ago
Why use squashfs when you can do the same OP did and serve a compressed version, so that the client is overwhelmed by both the uncompression and the DOM depth:
Resulting file is about 15 mib long and uncompresses into a 10 gib monstrosity containing 1789569706 unclosed nested divs
sroussey · 1h ago
You can also just use code to endlessly serve up something.
Also you can reverse many DoD vectors depending on how you are setup and costs. For example reverse Slowloris attack and use up their connections.
M95D · 4h ago
I like it. :)
CobrastanJorji · 9h ago
Yes, servers can respond without specifying the size by using chunked encoding. And you can do the rest with a custom web server that just handles request by returning "<div>" in a loop. I have no idea if browsers are vulnerable to such a thing.
konata390 · 8h ago
I just tested it via a small python script sending divs at a rate of ~900mb (as measured by curl) and firefox just kills the request after 1-2 gb received (~2 seconds) with an "out of memory" error, while chrome seems to only receive around 1mb/s, uses 1 cpu core 100%, and grows infinitely in memory use. I killed it after 3 mins and consuming ca. 6GB (additionally, on top of the memory it used at startup)
M95D · 7h ago
What did the bots do?
M95D · 8h ago
I would make it an invisible link from the main page (hidden behind a logo or something). Users won't click it, but bots will.
stefs · 7h ago
the problem with this is that for a tarpit, you just don't want to make it expensive for bots, you also want to make it cheap for yourself. this isn't cheap for you. a zip bomb is.
m463 · 17h ago
Sounds like the favicon.ico that would crash the browser.
Looks like something I should add for my web APIs which are to be queried only by clients aware of the API specification.
koolba · 19h ago
I hope you weren’t paying for bandwidth by the KiB.
santoshalper · 17h ago
Nah, back then we paid for bandwidth by the kb.
slicktux · 16h ago
That’s even worse! :)
amelius · 8h ago
Maybe it's time for a /dev/zipbomb device.
GTP · 5h ago
ln -s /dev/urandom /dev/zipbomb && echo 'Boom!'
Ok, not a real zip bomb, for that we would need a kernel module.
Dwedit · 1h ago
That costs you a lot of bandwidth, defeating the whole point of a zip bomb.
M95D · 8h ago
Could server-side includes be used for a html bomb?
Write an ordinary static html page and fill a <p> with infinite random data using <!--#include file="/dev/random"-->.
or would that crash the server?
GTP · 5h ago
I guess it depends on the server's implementation. but, since you need some logic to decide when to serve the html bomb anyway, I don't see why you would prefer this solution. Just use whatever script you're using to detect the bots to serve the bomb.
M95D · 4h ago
No other scripts. Hide the link to the bomb behind an image so humans can't click it.
AStonesThrow · 8h ago
Wait, you set up a symlink?
I am not sure how that could’ve worked. Unless the real /dev tree was exposed to your webserver’s chroot environment, this would’ve given nothing special except “file not found”.
The whole point of chroot for a webserver was to shield clients from accessing special files like that!
vidarh · 7h ago
You yourself explain how it could've worked: Plenty of webservers are or were not chroot'ed.
pandemic_region · 5h ago
Which means that if your bot is getting slammed by this, you can assume it's not chrooted and hence a more likely target for attack.
vidarh · 12m ago
This does not logically follow. If your bot is getting slammed by a page returning all zeros (what the person I replied to reacted to), all you know is something on the server is returning a neverending stream of zeros. A symlink to /dev/zero is an easy way of doing that, but knowing the server is serving up a neverending stream of zeros by no means tells you whether the server is running in a decently isolated environment or not.
Even if you knew it was done with a symlink you don't know that - these days odds are it'd run in a container or vm, and so having access to /dev/zero means very little.
"On 21 September 1997, the USS Yorktown halted for almost three hours during training maneuvers off the coast of Cape Charles, Virginia due to a divide-by-zero error in a database application that propagated throughout the ship’s control systems."
" technician tried to digitally calibrate and reset the fuel valve by entering a 0 value for one of the valve’s component properties into the SMCS Remote Database Manager (RDM)"
astolarz · 17h ago
Bad bot
fuzztester · 16h ago
I remember reading about that some years ago. It involved Windows NT.
Though, bots may not support modern compression standards. Then again, that may be a good way to block bots: every modern browser supports zstd, so just force that on non-whitelisted browser agents and you automatically confuse scrapers.
andersmurphy · 7h ago
So I actually do this (use compression to filter out bots) for my one million checkboxes Datastar demo[1]. It relies heavily on streaming the whole user view on every interaction. With brotli over SSE you can easily hit 200:1 compression ratios[2]. The problem is a malicious actor could request the stream uncompressed. As brotli is supported by 98% of browsers I don't push data to clients that don't support brotli compression. I've also found a lot of scrapers and bots don't support it so it works quite well.
If you nest the gzip inside another gzip it gets even smaller since the blocks of compressed '0' data are themselves low entropy in the first generation gzip. Nested zst reduces the 10G file to 99 bytes.
galangalalgol · 13h ago
Can you hand edit to create recursive file structures to make it infinite? I used to use debug in dos to make what appeared to be gigantic floppy discs by editing the fat
That's what I was hoping for with the original article.
Thorrez · 5h ago
But the bot likely only automatically unpacks the outer layer. So nesting doesn't help with bot deterrence.
Cloudef · 13h ago
Wouldnt that defeat the attack though as you arent serving the large content anymore
kevin_thibedeau · 13h ago
It would need a bot that is accessing files via hyperlink with an aim to decompress them and riffle through their contents. The compressed file can be delivered over a compressed response to achieve the two layers, cutting down significantly on the outbound traffic. passwd.zst, secrets.docx, etc. would look pretty juicy. Throw some bait in honeypot directories (exposed for file access) listed in robots.txt and see who takes it.
xiaoyu2006 · 12h ago
How will my browser react on receiving such bombs? I’d rather not to test it myself…
jeroenhd · 8h ago
Last time I checked, the tab keeps loading, freezes, and the process that's assigned to rendering the tab gets killed when it eats too much RAM. Might cause a "this tab is slowing down your browser" popup or general browser slowness, but nothing too catastrophic.
How bad the tab process dying is, depends per browser. If your browser does site isolation well, it'll only crash that one website and you'll barely notice. If that process is shared between other tabs, you might lose state there. Chrome should be fine, Firefox might not be depending on your settings and how many tabs you have open, with Safari it kind of depends on how the tabs were opened and how the browser is configured. Safari doesn't support zstd though, so brotli bombs are the best you can do with that.
anthk · 5h ago
gzip it's everywhere and it will mess with every crawler.
bilekas · 20h ago
> At my old employer, a bot discovered a wordpress vulnerability and inserted a malicious script into our server
I know it's slightly off topic, but it's just so amusing (edit: reassuring) to know I'm not the only one who, after 1 hour of setting up Wordpress there's a PHP shell magically deployed on my server.
protocolture · 17h ago
>Take over a wordpress site for a customer
>Oh look 3 separate php shells with random strings as a name
Never less than 3, but always guaranteed.
ianlevesque · 19h ago
Yes, never self host Wordpress if you value your sanity. Even if it’s not the first hour it will eventually happen when you forget a patch.
sunaookami · 19h ago
Hosting WordPress myself for 13 years now and have no problem :) Just follow standard security practices and don't install gazillion plugins.
ozim · 5h ago
I have better things to do with my time so I happily pay someone else to host it for me.
carlosjobim · 19h ago
There's a lot of essential functionality missing from WordPress, meaning you have to install plugins. Depending on what you need to do.
But it's such a bad platform that there really isn't any reason for anybody to use WordPress for anything. No matter your use case, there will be a better alternative to WordPress.
aaronbaugher · 19h ago
Can you recommend an alternative for a non-technical organization, where there's someone who needs to be able to edit pages and upload documents on a regular basis, so they need as user-friendly an interface as possible for that? Especially when they don't have a budget for it, and you're helping them out as a favor? It's so easy to spin up Wordpress for them, but I'm not a fan either.
I've tried Drupal in the past for such situations, but it was too complicated for them. That was years ago, so maybe it's better now.
vinceguidry · 16s ago
Wiki software is the way to go here.
ufmace · 13h ago
I find it very telling that there's no 2 responses to this post recommending the same thing. Confirms my belief that there is no real alternative to Wordpress for a free and open-source CMS that is straightforward to install and usable to build and edit pages by non-tech-experts.
eru · 10h ago
Perhaps people who wanted to recommend the same thing as was already written, just upvoted instead of writing their own comment?
Pretty sure Drupal has been around for like, 20 years or so. Or is this a different Drupal?
nulbyte · 5h ago
Drupal has been around for a while, but I've never heard of "Drupal CMS" as a separate product until now.
It appears Drupal CMS is a customized version of Drupal that is easier for less tech-savvy folks to get up and running. At least, that's the impression I got reading through the marketing hype that "explains" it with nothing but buzzwords.
donnachangstein · 19h ago
> Can you recommend an alternative for a non-technical organization, where there's someone who needs to be able to edit pages and upload documents on a regular basis, so they need as user-friendly an interface as possible for that
25 years ago we used Microsoft Frontpage for that, with the web root mapped to a file share that the non-technical secretary could write to and edit it as if it were a word processor.
Somehow I feel we have regressed from that simplicity, with nothing but hand waving to make up for it. This method was declared "obsolete" and ... Wordpress kludges took its place as somehow "better". Someone prove me wrong.
shakna · 9h ago
Part of that is Frontpage needing a Windows server, and all that entails.
The other part is clients freaking out after Frontpage had a series of dangerous CVEs all in a row.
And then finally every time a part of Frontpage got popular, MS would deprecate the API and replace it with a new one.
Wordpress was in the right place at the right time.
aaronbaugher · 3h ago
Yeah, getting Frontpage working on a Linux/Apache system and supporting it back then wasn't exactly a treat. Good idea, maybe, but bad implementation.
MrDOS · 5h ago
For those on macOS, RapidWeaver still exists: https://www.realmacsoftware.com/rapidweaver/. (Shame that it's now subscriptionware, though – could've sworn it used to be an outright purchase per major version.)
bigfatkitten · 17h ago
A previous workplace of mine did the same with Netscape (and later, Mozilla) Composer. Users could modify content via WebDAV.
blipvert · 10h ago
We have a (internally accessible only) WP instance where the content is exported using a plugin as a ZIP file and then deployed to NGINX servers with a bit of scripting/Ansible.
Could be automated better (drop ZIP to a share somewhere where it gets processed and deployed) but best of both worlds.
YES! I have switched to it for professional and personal CMS work and it's great. Incredibly flexible and simplistic in my opinion. I use it both as headful and headless.
rpmisms · 15h ago
Seconded. It's absolutely phenomenal as a headful or headless CMS.
1oooqooq · 4h ago
weird "license" on that project. pretty much blocks any self host usage besides a personal blog.
And only hosted option for the copyrighted code starts at 300/y
these don't cover any use case people use WordPress for.
Jekyll and other static site generators do not repo Wordpress any more than notepad repos MSWord
In one, multiple users can login, edit WYSIWYG, preview, add images, etc, all from one UI. You can access it from any browser including smart phones and tablets.
In the other, you get to instruct users on git, how to deal with merge conflicts, code review (two people can't easily work on a post like they can in wordpress), previews require a manual build, you need a local checkout and local build installation to do the build. There no WYSIWYG, adding images is a manual process of copying a file, figuring out the URL, etc... No smartphone/tablet support. etc....
I switched by blog from wordpress install to a static site geneator because I got tired of having to keep it up to date but my posting dropped because of friction of posting went way up. I could no longer post from a phone. I couldn't easily add images. I had to build to preview. And had to submit via git commits and pushes. All of that meant what was easy became tedious.
what are your favorite static site generators? I googled it and cloudflare article came up with Jekyll,Gatsby,Hugo,Next.js, Eleventy. But would like to avoid doing research if can be helped on pros/cons of each.
socalgal2 · 13h ago
I looked recently when thinking of starting some new shared blog. My criteria was "based on tech I know". I don't know Ruby so Jekyll was out. I tried Eleventy and Hexo. I chose Hexo but then ultimately decided I wasn't going to do this new blog.
IIRC, Eleventy printed lots of out-of-date warnings when I installed it and/or the default style was broken in various ways which didn't give me much confidence.
My younger sister asked me to help her start a blog. I just pointed her to substack. Zero effort, easy for her.
pmontra · 10h ago
I work with Ruby but I never had to use Ruby to use Jekyll. I downloaded the docker image and run it. It checks a host directory for updates and generates the HTML files. It could be written in any other language I don't know.
justusthane · 16h ago
I don’t have much experience with other SSGs, but I’ve been using Eleventy for my personal site for a few years and I’m a big fan. It’s very simple to get started with, it’s fast to build, it’s powerful and flexible.
I build mine with GitHub Actions and host it free on Pages.
Tistron · 9h ago
I've come to really appreciate Astro.js
It's quite simple to get started, fairly intuitive for me, and very powerful.
beeburrt · 17h ago
Jekyll and GitHub pages go together pretty well.
msh · 6h ago
Its sad software like citydesk died and did not evolve into multiuser applications.
carlosjobim · 18h ago
Yes I can. There's an excellent and stable solution called SurrealCMS, made by an indie developer. You connect it by FTP to any traditional web design (HTML+CSS+JS), and the users get a WYSIWYG editor where the published output looks exactly as it looked when editing. It's dirt cheap at $9 per month.
Edit: I actually feel a bit sorry for the SurrealCMS developer. He has a fantastic product that should be an industry standard, but it's fairly unknown.
Then WordPress is just your private CMS/UI for making changes, and it generates static files that are uploaded to a webhost like CloudFlare Pages, GitHub Pages, etc.
sureIy · 1h ago
It has been a long time since I tried that, but it was never as simple as they claimed it to be.
Now that plugin became a service, at which point you might just use a WP host and let them do their thing.
dmje · 9h ago
Just not true, although entirely aligned with HN users who often believe that the levels of nerdery on HN are common in the real world. WP isn’t bad, you’ve just done it wrong, and there really isn’t a better alternative for hundreds and hundreds of use cases..
carlosjobim · 5h ago
My perspective is that WordPress is too complicated and too nerdy for most real world users. They are usually better off with a solution that is tailor made for their use case. And there's plenty of such solutions. Even for blogging, there are much better solutions than WordPress for non-technical users.
wincy · 19h ago
I do custom web dev so am way out of the website hosting game. What are good frameworks now if I want to say, light touch help someone who is slightly technical set up a website? Not full react SPA with an API.
carlosjobim · 18h ago
By the sound of your question I will guess you want to make a website for a small or medium sized organization? jQuery is probably the only "framework" you should need.
If they are selling anything on their website, it's probably going to be through a cloud hosted third party service and then it's just an embedded iframe on their website.
If you're making an entire web shop for a very large enterprise or something of similar magnitude, then you have to ask somebody else than me.
felbane · 18h ago
Does anyone actually still use jQuery?
Everything I've built in the past like 5 years has been almost entirely pure ES6 with some helpers like jsviews.
karaterobot · 17h ago
jQuery's still the third most used web framework, behind React and before NextJS. If you use jQuery to build Wordpress websites, you'd be specializing in popular web technologies in the year 2025.
I've seen this site linked for many years among web devs, but I just don't understand the purpose? jQuery code is much cleaner and easier to understand, and there's a great amount of solutions written for jQuery available online for almost any need you have.
j16sdiz · 14h ago
The vanilla one is so much longer.
arcfour · 17h ago
Never use that junk if you value your sanity, I think you mean.
ufmace · 13h ago
Ditto to self-hosting wordpress works fine with standard hosting practices and not installing a bazillion random plugins.
maeln · 9h ago
I never hosted WP, but as soon as you have a HTTP server expose to the internet you will get request to /wp-login and such. It as become a good way to find bots also. If I see an IP requesting anything from a popular CMS, hop it goes in the iptables holes
Perz1val · 8h ago
Hey, I check /wp-admin sometimes when I see a website and it has a certain feel to it
victorbjorklund · 8h ago
I do the same. Great way to filter our security scanners.
Aransentin · 8h ago
Wordpress is indeed a nice backdoor, it even has CMS functionality built in.
dx4100 · 16h ago
There's ways that prevent it -
- Freeze all code after an update through permissions
- Don't make most directories writeable
- Don't allow file uploads, or limit file uploads to media
There's a few plugins that do this, but vanilla WP is dangerous.
colechristensen · 17h ago
>after 1 hour
I've used this teaching folks devops, here deploy your first hello world nginx server... huh what are those strange requests in the log?
ChuckMcM · 20h ago
I sort of did this with ssh where I figured out how to crash an ssh client that was trying to guess the root password. What I got for my trouble was a number of script kiddies ddosing my poor little server. I switched to just identifying 'bad actors' who are clearly trying to do bad things and just banning their IP with firewall rules. That's becoming more challenging with IPV6 though.
Edit: And for folks who write their own web pages, you can always create zip bombs that are links on a web page that don't show up for humans (white text on white background with no highlight on hover/click anchors). Bots download those things to have a look (so do crawlers and AI scrapers)
grishka · 15h ago
> you can always create zip bombs that are links on a web page that don't show up for humans
I did a version of this with my form for requesting an account on my fediverse server. The problem I was having is that there exist these very unsophisticated bots that crawl the web and submit their very unsophisticated spam into every form they see that looks like it might publish it somewhere.
First I added a simple captcha with distorted characters. This did stop many of the bots, but not all of them. Then, after reading the server log, I noticed that they only make three requests in a rapid succession: the page that contains the form, the captcha image, and then the POST request with the form data. They don't load neither the CSS nor the JS.
So I added several more fields to the form and hid them with CSS. Submitting anything in these fields will fail the request and ban your session. I also modified the captcha, I made the image itself a CSS background, and made the src point to a transparent image instead.
And just like that, spam has completely stopped, while real users noticed nothing.
anamexis · 12h ago
I did essentially the same thing. I have this input in a form:
And any form submission with a value set for the email is blocked. It stopped 100% of the spam I was getting.
DuncanCoffee · 7h ago
Would this also stop users with automatic form filling enabled?
grishka · 7h ago
No, `autocomplete="off"` takes care of that
BarryMilo · 12h ago
We use to just call those honeypot fields. Works like a charm.
a_gopher · 2h ago
apart from blind users, who are also now completely unable to use their screenreaders with your site
ChuckMcM · 14h ago
Oh that is great.
dsp_person · 13h ago
> you can always create zip bombs that are links on a web page that don't show up for humans (white text on white background with no highlight on hover/click anchors)
RIP screen reader users?
some-guy · 13h ago
“aria-hidden” would spare those users, and possibly be ignored by the bots unless they are sophisticated.
j_walter · 20h ago
Check this out if you want to stop this behavior...
> I sort of did this with ssh where I figured out how to crash an ssh client that was trying to guess the root password. What I got for my trouble was a number of script kiddies ddosing my poor little server.
This is the main reason I haven't installed zip bombs on my website already -- on the off chance I'd make someone angry and end up having to fend off a DDoS.
Currently I have some URL patterns to which I'll return 418 with no content, just to save network / processing time (since if a real user encounters a 404 legitimately, I want it to have a nice webpage for them to look at).
Should probably figure out how to wire that into fail2ban or something, but not a priority at the moment.
1970-01-01 · 20h ago
Why is it harder to firewall them with IPv6? I seems this would be the easier of the two to firewall.
carlhjerpe · 18h ago
Manual banning is about the same since you just book /56 or bigger, entire providers or countries.
Automated banning is harder, you'd probably want a heuristic system and look up info on IPs.
IPv4 with NAT means you can "overban" too.
malfist · 16h ago
Why wouldn't something like fail2ban not work here? That's what it's built for and has been around for eons.
ozim · 3h ago
Fun part was that fail2ban had RCE vulnerability. So you were more secure not running it now it should be fixed but can you be sure?
carlhjerpe · 6h ago
You don't always firewall 80/443 in Linux :(
firesteelrain · 19h ago
I think they are suggesting the range of IPs to block is too high?
CBLT · 18h ago
Allow -> Tarpit -> Block should be done by ASN
carlhjerpe · 18h ago
You probably want to check how many ips/blocks a provider announces before blocking the entire thing.
It's also not a common metric you can filter on in open firewalls since you must lookup and maintain a cache of IP to ASN, which has to be evicted and updated as blocks still move around.
echoangle · 19h ago
Maybe it’s easier to circumvent because getting a new IPv6 address is easier than with IPv4?
flexagoon · 13h ago
Automated systems like Cloudflare and stuff also have a list of bot IPs. I was recently setting up a selfhosted VPN and I had to change the IPv4 of the server like 20 times before I got an IP that wasn't banned on half the websites.
bjoli · 11h ago
I am just banning large swaths of IPs. Banning most of Asia and the middle east reduced the amount of bad traffic by something like 98%.
leephillips · 18h ago
These links do show up for humans who might be using text browsers, (perhaps) screen readers, bookmarklets that list the links on a page, etc.
alpaca128 · 6h ago
Weird that text browsers just ignore all the attributes that hide elements. I get that they don't care about styling, but even a plain hidden attribute or aria-hidden are ignored.
ChuckMcM · 18h ago
true, but you can make the link text 'do not click this' or 'not a real link' to let them know. I'm not sure if crawlers have started using LLMs to check pages or not which would be a problem.
marcusb · 19h ago
Zip bombs are fun. I discovered a vulnerability in a security product once where it wouldn’t properly scan a file for malware if the file was or contained a zip archive greater than a certain size.
The practical effect of this was you could place a zip bomb in an office xml document and this product would pass the ooxml file through even if it contained easily identifiable malware.
secfirstmd · 18h ago
Eh I got news for ya.
The file size problem is still an issue for many big name EDRs.
marcusb · 18h ago
Undoubtedly. If you go poking around most any security product (the product I was referring to was not in the EDR space,) you'll see these sorts of issues all over the place.
j16sdiz · 14h ago
It have to be the way it is.
Scanning them are resources intensive.
The choice are (1) skip scanning them; (2) treat them as malware; (3) scan them and be DoS'ed.
(deferring the decision to human iss effectively DoS'ing your IT support team)
avidiax · 13h ago
Option #4, detect the zip bomb in its compressed form, and skip over that section of the file. Just like the malware ignores the zip bomb.
im3w1l · 13h ago
Just the fact that it contains a zip bomb makes it malware by itself.
marcusb · 5h ago
It does not have to be the way it is. Security vendors could do a much better job testing and red teaming their products to avoid bypasses, and have more sensible defaults.
LordGrignard · 15h ago
is that endpoint detection and response?
marcusb · 6h ago
Yes
kazinator · 19h ago
I deployed this, instead of my usual honeypot script.
It's not working very well.
In the web server log, I can see that the bots are not downloading the whole ten megabyte poison pill.
They are cutting off at various lengths. I haven't seen anything fetch more than around 1.5 Mb of it so far.
Or is it working? Are they decoding it on the fly as a stream, and then crashing? E.g. if something is recorded as having read 1.5 Mb, could it have decoded it to 1.5 Gb in RAM, on the fly, and crashed?
There is no way to tell.
MoonGhost · 19h ago
Try content labyrinth. I.e. infinitely generated content with a bunch of references to other generated pages. It may help against simple wget and till bots adapt.
PS: I'm on the bots side, but don't mind helping.
palijer · 16h ago
This doesn't work if you pay bandwidth and CPU usage for your servers though.
Twirrim · 11h ago
The labyrinth doesn't have to be fast, and things like iocaine (https://iocaine.madhouse-project.org/) don't use much CPU if you don't go and give them something like the Complete Works of Ahakespeare as input (Mine is using Moby Dick), and can easily be constrained with cgroups if you're concerned about resource usage.
I've noticed that LLM scrapers tend to be incredibly patient. They'll wait for minutes for even small amounts of text.
MoonGhost · 14h ago
That will be your contribution. If others join scrapping will become very pricey. Till bots become smarter. But then they will not download much of generated crap. Which makes it cheaper for you.
Anyway, from bots perspective labyrinths aren't the main problem. Internet is being flooded with quality LLM-generated content.
bugfix · 15h ago
Wouldn't this just waste your own bandwidth/resources?
gwd · 4h ago
Kinda wonder if a "content labyrinth" could be used to influence the ideas / attitudes of bots -- fill it with content pro/anti Communism, or Capitalism, or whatever your thing is, hope it tips the resulting LLM towards your ideas.
arctek · 15h ago
Perhaps need to semi-randomize the file size?
I'm guessing some of the bots have a hard limit to the size of the resource they will download.
Many of these are annoying LLM training/scraping bots (in my case anyway).
So while it might not crash them if you spit out a 800KB zipbomb, at least it will waste computing resources on their end.
unnouinceput · 19h ago
Do they comeback? If so then they detect it and avoid it. If not then they crashed and mission accomplished.
kazinator · 18h ago
I currently cannot tell without making a little configuration change, because as soon as an IP address is logged as having visited the trap URL (honeypot, or zipbomb or whatever), a log monitoring script bans that client.
Secondly, I know that most of these bots do not come back. The attacks do not reuse addresses against the same server in order to evade almost any conceivable filter rule that is predicated on a prior visit.
jpsouth · 7h ago
I may be asking a really silly question here, but
> as soon as an IP address is logged as having visited the trap URL (honeypot, or zipbomb or whatever), a log monitoring script bans that client.
Is this not why they aren’t getting the full file?
kazinator · 3h ago
I believe Apache is logging complete requests. For instance, in the case of clients sent to a honeypot, I see a log entry appear when I pick a honeypot script from the process listing and kill it. That could be hours after the client connected.
The timestamps logged are connection time not completion time. E.g. here is a pair of consecutive logs:
Notice the second timestamp is almost ten minutes earlier.
KTibow · 20h ago
It's worth noting that this is a gzip bomb (acts just like a normal compressed webpage), not a classical zip file that uses nested zips to knock out antiviruses.
tga_d · 15h ago
There was an incident a little while back where some Tor Project anti-censorship infrastructure was run on the same site as a blog post about zip bombs.[0] One of the zip files got crawled by Google, and added to their list of malicious domains, which broke some pretty important parts of Tor's Snowflake tool. Took a couple weeks to get it sorted out.[1]
IsMalicious() doing some real heavy lifting in that pseudo code. Would love to see a bit more under THAT hood.
seethishat · 3h ago
It's probably watching for connections to files listed in robots.txt that should not be crawled, etc. Once a client tries to do that thing (which it was told not to do), then it gets tagged malicious and fed the zip file.
gherard5555 · 6h ago
There is a similar thing for ssh servers, called endlessh (https://github.com/skeeto/endlessh). In the ssh protocol the client must wait for the server to send back a banner when it first connects, but there is no limit for the size of it ! So this program will send an infinite banner very ... very slowly; and make the crawler/script kiddie script hang out indefinitely or just crash.
Hilarious because the author, and the OP author, are literally zipping `/dev/null`. While they realize that it "doesn't take disk space nor ram", I feel like the coin didn't drop for them.
Other than that, why serve gzip anyway? I would not set the Content-Length Header and throttle the connection and set the MIME type to something random, hell just octet-stream, and redirect to '/dev/random'.
I don't get the 'zip bomb' concept, all you are doing is compressing zeros. Why not compress '/dev/random'? You'll get a much larger file, and if the bot receives it, it'll have a lot more CPU cycles to churn.
Even the OP article states that after creating the '10GB.gzip' that 'The resulting file is 10MB in this case.'.
Is it because it sounds big?
Here is how you don't waste time with 'zip bombs':
$ time dd if=/dev/zero bs=1 count=10M | gzip -9 > 10M.gzip
10485760+0 records in
10485760+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 9.46271 s, 1.1 MB/s
real 0m9.467s
user 0m2.417s
sys 0m14.887s
$ ls -sh 10M.gzip
12K 10M.gzip
$ time dd if=/dev/random bs=1 count=10M | gzip -9 > 10M.gzip
10485760+0 records in
10485760+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 12.5784 s, 834 kB/s
real 0m12.584s
user 0m3.190s
sys 0m18.021s
$ ls -sh 10M.gzip
11M 10M.gzip
onethumb · 1m ago
The whole point is for it to cost less (ie, smaller size) for the sender and cost more (ie, larger size) for the receiver.
The compression ratio is the whole point... if you can send something small for next to no $$ which causes the receiver to crash due to RAM, storage, compute, etc constraints, you win.
wewewedxfgdf · 20h ago
I protected uploads on one of my applications by creating fixed size temporary disk partitions of like 10MB each and unzipping to those contains the fallout if someone uploads something too big.
warkdarrior · 20h ago
`unzip -p | head -c 10MB`
kccqzy · 15h ago
Doesn't deal with multi-file ZIP archives. And before you think you can just reject user uploads with multi-file ZIP archives, remember that macOS ZIP files contain the __MACOSX folder with ._ files.
sidewndr46 · 20h ago
What? You partitioned a disk rather than just not decompressing some comically large file?
2048 yottabyte Zip Bomb
This zip bomb uses overlapping files and recursion to achieve 7 layers with 256 files each, with the last being a 32GB file.
It is only 266 KB on disk.
When you realise it's a zip bomb it's already too late. Looking at the file size doesn't betray its contents. Maybe applying some heuristics with ClamAV? But even then it's not guaranteed. I think a small partition to isolate decompression is actually really smart. Wonder if we can achieve the same with overlays.
sidewndr46 · 20h ago
What are you talking about? You get a compressed file. You start decompressing it. When the amount of bytes you've written exceeds some threshold (say 5 megabytes) just stop decompressing, discard the output so far & delete the original file. That is it.
AndrewStephens · 17h ago
I worked on a commercial HTTP proxy that scanned compressed files. Back then we would start to decompress a file but keep track of the compression ratio. I forget what the cutoff was but as soon as we saw a ratio over a certain threshold we would just mark the file as malicious and block it.
tremon · 20h ago
That assumes they're using a stream decompressor library and are feeding that stream manually. Solutions that write the received file to $TMP and just run an external tool (or, say, use sendfile()) don't have the option to abort after N decompressed bytes.
overfeed · 18h ago
> Solutions that write the received file to $TMP and just run an external tool (or, say, use sendfile()) don't have the option to abort after N decompressed bytes
cgroups with hard-limits will let the external tool's process crash without taking down the script or system along with it.
pessimizer · 16h ago
> cgroups with hard-limits
This is exactly the same idea as partitioning, though.
messe · 5h ago
> That assumes they're using a stream decompressor library and are feeding that stream manually. Solutions that write the received file to $TMP and just run an external tool (or, say, use sendfile()) don't have the option to abort after N decompressed bytes.
In a practical sense, how's that different from creating a N-byte partition and letting the OS return ENOSPC to you?
gruez · 20h ago
Depending on the language/library that might not always be possible. For instance python's zip library only provides an extract function, without a way to hook into the decompression process, or limit how much can be written out. Sure, you can probably fork the library to add in the checks yourself, but from a maintainability perspective it might be less work to do with the partition solution.
banana_giraffe · 18h ago
It also provides an open function for the files in a zip file. I see no reason something like this won't bail after a small limit:
import zipfile
with zipfile.ZipFile("zipbomb.zip") as zip:
for name in zip.namelist():
print("working on " + name)
left = 1000000
with open("dest_" + name, "wb") as fdest, zip.open(name) as fsrc:
while True:
block = fsrc.read(1000)
if len(block) == 0:
break
fdest.write(block)
left -= len(block)
if left <= 0:
print("too much data!")
break
maxbond · 18h ago
That is exactly what OP is doing, they've just implemented it at the operating system/file system level.
gchamonlive · 20h ago
Those files are designed to exhaust the system resources before you can even do these kinds of checks. I'm not particularly familiar with the ins and outs of compression algorithms, but it's intuitively not strange for me to have a a zip that is carefully crafted so that memory and CPU goes out the window before any check can be done. Maybe someone with more experience can give mode details.
I'm sure though that if it was as simples as that we wouldn't even have a name for it.
crazygringo · 18h ago
Not really. It really is that simple. It's just dictionary decompression, and it's just halting it at some limit.
It's just nobody usually implements a limit during decompression because people aren't usually giving you zip bombs. And sometimes you really do want to decompress ginormous files, so limits aren't built in by default.
Your given language might not make it easy to do, but you should pretty much always be able to hack something together using file streams. It's just an extra step is all.
gchamonlive · 5h ago
I honestly thought it was harder. It's still a burden on the developer to use the tools in the intended way so that the application isn't vulnerable, so it's something to keep in mind when implementing functionality that requires unpacking user provided compressed archives.
kulahan · 18h ago
Isn’t this basically a question about the halting problem? Whatever arbitrary cutoff you chose might not work for all.
kam · 17h ago
No, compression formats are not Turing-complete. You control the code interpreting the compressed stream and allocating the memory, writing the output, etc. based on what it sees there and can simply choose to return an error after writing N bytes.
eru · 10h ago
Yes, and even if they were Turing complete, you could still run your Turing-machine-equivalent for n steps only before bailing.
Rohansi · 17h ago
Not really. It's easy to abort after exceeding a number of uncompressed bytes or files written. The problem is the typical software for handling these files does not implement restrictions to prevent this.
est · 13h ago
damn, it broke the macOS archiver utility.
kccqzy · 20h ago
Seems like a good and simple strategy to me. No real partition needed; tmpfs is cheap on Linux. Maybe OP is using tools that do not easily allow tracking the number of uncompressed bytes.
wewewedxfgdf · 19h ago
Yes I'd rather deal with a simple out of disk space error than perform some acrobatics to "safely" unzip a potential zip bomb.
Also zip bombs are not comically large until you unzip them.
Also you can just unpack any sort of compressed file format without giving any thought to whether you are handling it safely.
anthk · 5h ago
I'd put fake paper namers (doi.numbers.whatever.zip) in order to quickly keep their attention, among a robots.txt file for a /papers subdirectory to 'disallow' it. Add some index.html with links to fake 'papers' and in a week these crawlers will blacklist your like crazy.
jcynix · 3h ago
As I don't use PHP in my server, but get a lot of requests for various PHP related stuff, I added a rule to serve a Linux kernel encrypted with a "passphrase" derived from /dev/urandom as a reply for these requests. A zip bomb might be a worse reply ...
For all those "eagerly" fishing for content AI bots I ponder if I should set up a Markov chain to generate semi-legible text in the style of the classic https://en.wikipedia.org/wiki/Mark_V._Shaney ...
As an aside, there are a lot of people out there standing up massive microservice implementations¹ for relatively small sites/apps, which need to have this part printed, wrapped around a brick, and lobbed at their heads:
> A well-optimized, lightweight setup beats expensive infrastructure. With proper caching, a $6/month server can withstand tens of thousands of hits — no need for Kubernetes.
----
[1] Though doing this in order to play/learn/practise is, of course, understandable.
monster_truck · 18h ago
I do something similar using a script I've cobbled together over the years. Once a year I'll check the 404 logs and add the most popular paths trying to exploit something (ie ancient phpmyadmin vulns) to the shitlist. Requesting 3 of those URLs adds that host to a greylist that only accepts requests to a very limited set of legitimate paths.
fracus · 18h ago
I'm curious why a 10GB file of all zeroes would compress only to 10MB. I mean theoretically you could compress it to one byte. I suppose the compression happens on a stream of data instead of analyzing the whole, but I'd assume it would still do better than 10MB.
philsnow · 18h ago
A compressed file that is only one byte long can only represent maximally 256 different uncompressed files.
Signed, a kid in the 90s who downloaded some "wavelet compression" program from a BBS because it promised to compress all his WaReZ even more so he could then fit moar on his disk. He ran the compressor and hey golly that 500MB ISO fit into only 10MB of disk now! He found out later (after a defrag) that the "compressor" was just hiding data in unused disk sectors and storing references to them. He then learned about Shannon entropy from comp.compression.research and was enlightened.
david422 · 16h ago
> He found out later (after a defrag) that the "compressor" was just hiding data in unused disk sectors and storing references to them
So you could access the files until you wrote more data to disk?
thehappypm · 13h ago
Strange to think that is approach would actually work pretty damn well for most people because most people aren’t using therefore hard drive space
jabl · 6h ago
Ha ha, that compressor is some evil genius.
Brings to mind this 30+ year old IOCCC entry for compressing C code by storing the code in the file names.
man, a comment that brings back memories. you and me both.
tom_ · 17h ago
It has to cater for any possible input. Even with special case handling for this particular (generally uncommon) case of vast runs of the same value: the compressed data will probably be packetized somehow, and each packet can reproduce only so many repeats, so you'll need to repeat each packet enough times to reproduce the output. With 10 GB, it mounts up.
I tried this on my computer with a couple of other tools, after creating a file full of 0s as per the article.
gzip -9 turns it into 10,436,266 bytes in approx 1 minute.
xz -9 turns it into 1,568,052 bytes in approx 4 minutes.
bzip2 -9 turns it into 7,506 (!) bytes in approx 5 minutes.
I think OP should consider getting bzip2 on the case. 2 TBytes of 0s should compress nicely. And I'm long overdue an upgrade to my laptop... you probably won't be waiting long for the result on anything modern.
vitus · 15h ago
The reason why the discussion in this thread centers around gzip (and brotli / zstd) is because those are standard compression schemes that HTTP clients will generally support (RFCs 1952, 7932, and 8478).
As far as I can tell, the biggest amplification you can get out of zstd is 32768 times: per the standard, the maximum decompressed block size is 128KiB, and the smallest compressed block is a 3-byte header followed by a 1-byte block (e.g. run-length-encoded). Indeed, compressing a 1GiB file of zeroes yields 32.9KiB of output, which is quite close to that theoretical maximum.
Brotli promises to allow for blocks that decompress up to 16 MiB, so that actually can exceed the compression ratios that bzip2 gives you on that particular input. Compressing that same 1 GiB file with `brotli -9` gives an 809-byte output. If I instead opt for a 16 GiB file (dd if=/dev/zero of=/dev/stdout bs=4M count=4096 | brotli -9 -o zeroes.br), the corresponding output is 12929 bytes, for a compression ratio of about 1.3 million; theoretically this should be able to scale another 2x, but whether that actually plays out in practice is a different matter.
(The best compression for brotli should be available at -q 11, which is the default, but it's substantially slower to compress compared to `brotli -9`. I haven't worked out exactly what the theoretical compression ratio upper bound is for brotli, but it's somewhere between 1.3 and 2.8 million.)
Also note that zstd provides very good compression ratios for its speed, so in practice most use cases benefit from using zstd.
tom_ · 14h ago
That's a good point, thanks - I was thinking of this from the point of view of the client downloading a file and then trying to examine it, but of course you'd be much better off fucking up their shit at an earlier stage in the pipeline.
Dwedit · 1h ago
There's around a 64KB block size limit for a block of compressed data. That sets a max compression ratio.
dagi3d · 18h ago
I get your point(and have no idea why it isn't compressed more), but is the theoretical value of 1 byte correct? With just one single byte, how does it know how big should the file be after being decompressed?
hxtk · 16h ago
In general, this theoretical problem is called the Kolmogorov Complexity of a string: the size of the smallest program that outputs a the input string, for some definition of "program", e.g., an initial input tape for a given universal turing machine. Unfortunately, Kolmogorov Complexity in general is incomputable, because of the halting problem.
But a gzip decompressor is not turing-complete, and there are no gzip streams that will expand to infinitely large outputs, so it is theoretically possible to find the pseudo-Kolmogorov-Complexity of a string for a given decompressor program by the following algorithm:
Let file.bin be a file containing the input byte sequence.
1. BOUNDS=$(gzip --best -c file.bin | wc -c)
2. LENGTH=1
3. If LENGTH==BOUNDS, run `gzip --best -o test.bin.gz file.bin` and HALT.
4. Generate a file `test.bin.gz` LENGTH bytes long containing all zero bits.
5. Run `gunzip -k test.bin.gz`.
6. If `test.bin` equals `file.bin`, halt.
7. If `test.bin.gz` contains only 1 bits, increment LENGTH and GOTO 3.
8. Replace test.bin.gz with its lexicographic successor by interpreting it as a LENGTH-byte unsigned integer and incrementing it by 1.
9. GOTO 5.
test.bin.gz contains your minimal gzip encoding.
There are "stronger" compressors for popular compression libraries like zlib that outperform the "best" options available, but none of them are this exhaustive because you can surely see how the problem rapidly becomes intractable.
For the purposes of generating an efficient zip bomb, though, it doesn't really matter what the exact contents of the output file are. If your goal is simply to get the best compression ratio, you could enumerate all possible files with that algorithm (up to the bounds established by compressing all zeroes to reach your target decompressed size, which makes a good starting point) and then just check for a decompressed length that meets or exceeds the target size.
I think I'll do that. I'll leave it running for a couple days and see if I can generate a neat zip bomb that beats compressing a stream of zeroes. I'm expecting the answer is "no, the search space is far too large."
hxtk · 14h ago
I'm an idiot, of course the search space is too large. It outgrows what I can brute force by the heat death of the universe by the time it gets to 16 bytes, even if the "test" is a no-op.
I would need to selectively generate grammatically valid zstd streams for this to be tractable at all.
kulahan · 18h ago
It’s a zip bomb, so does the creator care? I just mean from a practical standpoint - overflows and crashes would be a fine result.
suid · 15h ago
Good question. The "ultimate zip bomb" looks something like https://github.com/iamtraction/ZOD - this produces the infamous "42.zip" file, which is about 42KiB, but expands to 3.99 PiB (!).
There's literally no machine on Earth today that can deal with that (as a single file, I mean).
vitus · 14h ago
> There's literally no machine on Earth today that can deal with that (as a single file, I mean).
Oh? Certainly not in RAM, but 4 PiB is about 125x 36TiB drives (or 188x 24TiB drives). (You can go bigger if you want to shell out tens of thousands per 100TB SSD, at which point you "only" need 45 of those drives.)
These are numbers such that a purpose-built server with enough SAS expanders could easily fit that within a single rack, for less than $100k (based on the list price of an Exos X24 before even considering any bulk discounts).
immibis · 8h ago
I think you can rent a server with about 4.5 PiB from OVH - as a standard product offering, not even a special request. It costs a lot, obviously.
zparky · 4h ago
I would hope if you request a 4.5 PiB allocation somebody somewhere tries to call you to ask if you didnt accidentally put a couple extra zeroes lol
Do must unzip programs work recursively by default?
moooo99 · 11h ago
No, at least not the ones I am aware of. iirc these kinds of attacks usually targeted content scanners (primarily antivirus). And an AV program would of course have to recursively de compress everything
rtkwe · 18h ago
It'd have to be more than one byte. There's the central directory, zip header, local header then the file itself you need to also tell it how many zeros to make when decompressing the actual file but most compression algorithms don't work like that because they're designed for actual files not essentially blank files so you get larger than the absolute minimum compression.
malfist · 16h ago
I mean, if I make a new compression algorithm that says a 10GB file of zeros is represented with a single specific byte, that would technically be compression.
All depends on how much magic you want to shove into an "algorithm"
rtkwe · 13h ago
If it's not standard I count the extra program required to decompress it as part of the archive.
eru · 10h ago
Yes, though in this case that wouldn't add much.
kulahan · 18h ago
There probably aren’t any perfectly lossless compression algorithms, I guess? Nothing would ever be all zeroes, so it might not be an edge case accounted for or something? I have no idea, just pulling at strings. Maybe someone smarter can jump in here.
mr_toad · 17h ago
No lossless algorithm can compress all strings; some will end up larger. This is a consequence of the pigeonhole principle.
ugurs · 18h ago
It requires at leadt few bytes, there is no way to represent 10GB of data in 8 bits.
msm_ · 16h ago
But of course there is. Imagine the following compression scheme:
0-253: output the input byte
254 followed by 0: output 254
254 followed by 1: output 255
255: output 10GB of zeroes
Of course this is an artificial example, but theoretically it's perfectly sound. In fact, I think you could get there with static huffman trees supported by some formats, including gzip.
ugurs · 7h ago
What you suggest is saving the information somewhere else and putting a number to represent it. That is not compression, that is mapping. By using this logic, one can argue that one bit is enough as well.
extraduder_ire · 3h ago
> 254 followed by 0: output 254
126, surely?
immibis · 8h ago
gzip isn't optimal for this case. It divides the file into blocks and each one has a header. Apparently that's about 1 byte per 1000.
monus · 3h ago
The hard part is the content of isMalicious() function. The bots can crash but they’d be quick to restart anyway.
fareesh · 11h ago
Is there a list of popular attack vector urls located somewhere? I want to just auto-ban anyone sniffing for .env or ../../../../ etc.
Rather not write it myself
kqr · 11h ago
It would be a fairly short Perl script to read the access logs and curl a HEAD request to all URLs accessed, printing only those with 200 OK responses.
Here's a start hacked together and tested on my phone:
perl -lnE 'if (/GET ([^ ]+)/ and $p=$1) {
$s=qx(curl -sI https://BASE_URL/$p | head -n 1);
unless ($s =~ /200|302/) {
say $p
}
}'
vander_elst · 10h ago
Also interested in this. For now I've left a server up for a couple of weeks, went through the logs and set up fail2ban for the most common offenders. Once a month or so I keep checking for offenders but the first iteration already blocked many of them.
BehindTheMath · 3h ago
Check out Modsecurity WAF and CoreRuleSet.
eru · 10h ago
See https://research.swtch.com/zip for how to make an infinite zip bomb: ie a zip file that unzips to itself, so you can keep unzipping forever without ever hitting bottom.
manmal · 18h ago
> Before I tell you how to create a zip bomb, I do have to warn you that you can potentially crash and destroy your own device
Surely, the device does crash but it isn’t destroyed?
jawns · 20h ago
Is there any legal exposure possible?
Like, a legitimate crawler suing you and alleging that you broke something of theirs?
thayne · 19h ago
Disclosure: IANAL
The CFAA[1] prohibits:
> knowingly causes the transmission of a program, information, code, or command, and as a result of such conduct, intentionally causes damage without authorization, to a protected computer;
As far as I can tell (again, IANAL) there isn't an exception if you believe said computer is actively attempting to abuse your system[2]. I'm not sure if a zip bomb would constitute intentional damage, but it is at least close enough to the line that I wouldn't feel comfortable risking it.
[2]: And of course, you might make a mistake and incorrectly serve this to legitimate traffic.
jedberg · 18h ago
I don't believe the client counts as a protected computer because they initiated the connection. Also a protected computer is a very specific definition that involves banking and/or commerce and/or the government.
thayne · 17h ago
Part B of the definition of "protected computer" says:
> which is used in or affecting interstate or foreign commerce or communication, including a computer located outside the United States that is used in a manner that affects interstate or foreign commerce or communication of the United States
Assuming the server is running in the states, I think that would apply unless the client is in the same state as the server, in which case there is probably similar state law that comes into affect. I don't see anything there that excludes a client, and that makes sense, because otherwise it wouldn't prohibit having a site that tricks people into downloading malware.
jedberg · 12h ago
The word "accessed" is used multiple times throughout the law. A client accesses a server. A server does not access a client. It responds to a client.
Also, the protected computer has to be involved in commerce. Unless they are accessing the website with the zip bomb using a computer that also is uses for interstate or foreign commerce, it won't qualify.
eru · 10h ago
> Also, the protected computer has to be involved in commerce.
> The Commerce Clause is the source of federal drug prohibition laws under the Controlled Substances Act. In a 2005 medical marijuana case, Gonzales v. Raich, the U.S. Supreme Court rejected the argument that the ban on growing medical marijuana for personal use exceeded the powers of Congress under the Commerce Clause. Even if no goods were sold or transported across state lines, the Court found that there could be an indirect effect on interstate commerce and relied heavily on a New Deal case, Wickard v. Filburn, which held that the government may regulate personal cultivation and consumption of crops because the aggregate effect of individual consumption could have an indirect effect on interstate commerce.
thayne · 11h ago
> The word "accessed" is used multiple times throughout the law.
So what? It isn't in the section I quoted above. I could be wrong, but my reading is that transmitting information that can cause damage with the intent of causing damage is a violation, regardless of if you "access" another system.
> Also, the protected computer has to be involved in commerce
Or communication.
Now, from an ethics standpoint, I don't think there is anything wrong with returning a zipbomb to malicious bots. But I'm not confident enough that doing so is legal that I would risk doing so.
jedberg · 10h ago
> So what? It isn't in the section I quoted above.
You can't read laws in sections like that. They sections go together. The entire law is about causing damage through malicious access. But servers don't access clients.
The section you quoted isn't relevant because the entire law is about clients accessing servers, not servers responding to clients.
thayne · 9h ago
Every reference to access I see in that law is in a separate item in the list of violations in section 1. Where do you see something that would imply that section 5a only applies to clients accessing servers?
immibis · 8h ago
A protected computer is "a computer which is protected by this law", which is most American computers, not a special class of American computers. The only reason it's not all American computers is that the US federal government doesn't have full jurisdiction over the US. They wrote the definition of "protected computer" to include all the computers they have jurisdiction over.
In particular, the interstate commerce clause is very over-reaching. It's been ruled that someone who grew their own crops to feed to their own farm animals sold locally was conducting interstate commerce because they didn't have to buy them from another state.
eqvinox · 7h ago
Just put a "by connecting to this service, you agree to and authorize…" at the front of the zipbomb.
(I'm half-joking, half-crying. It's how everything else works, basically. Why would it not work here? You could even go as far as explicitly calling it a "zipbomb test delivery service". It's not your fault those bots have no understanding what they're connecting to…)
gblargg · 8h ago
So the trick is to disguise it as an accident. Have the zip bomb look like a real HTML file at the beginning, then have zeroes after that, like it got corrupted.
sinuhe69 · 16h ago
There is IMO no legal use case for an external computer system to initiate a connection with my system without prior legal agreement. It all happens on good will and therefore can be terminated at any time.
sinuhe69 · 16h ago
There is IMO no legal use case for an external computer system to initiate a connection with my system without prior legal agreement. It all happens on good will.
klabb3 · 13h ago
Just crossed my mind that perhaps lots of bot traffic is coming from botnets of unaware victims who downloaded a shitty game or similar, orchestrated by a malicious C&C server somewhere else. (There was a post about this type of malware recently.) Now, if you crash the victims machine, it’s complicated at least ethically, if not legally.
eru · 10h ago
Though ethically it might be a good thing to shut down their infected computer, instead of keeping it running.
brudgers · 19h ago
Though anyone can sue anyone, not doing X is the simplest thing that might avoid being sued for doing X.
But if it matters pay your lawyer and if it doesn’t matter, it doesn’t matter.
bilekas · 20h ago
Please, just as a conversational piece, walk me through the potentials you might think there are ?
I'll play the side of the defender and you can play the "bot"/bot deployer.
echoangle · 19h ago
Well creating a bot is not per se illegal, so assuming the maliciousness-detector on the server isn’t perfect, it could serve the zip bomb to a legitimate bot. And I don’t think it’s crazy that serving zip bombs with the stated intent to sabotage the client would be illegal. But I’m not a lawyer, of course.
bilekas · 18h ago
Disclosure, I'm not a lawyer either. This is all hypothetical high level discussion here.
> it could serve the zip bomb to a legitimate bot.
Can you define the difference between a legitimate bot, and a non legitimate bot for me ?
The OP didn't mention it, but if we can assume they have SOME form of robots.txt (safe assumtion given their history), would those bots who ignored the robots be considered legitimate/non-legitimate ?
Almost final question, and I know we're not lawyers here, but is there any precedent in case law or anywhere, which defines a 'bad bot' in the eyes of the law ?
Final final question, as a bot, do you believe you have a right or a privilege to scrape a website ?
echoangle · 8h ago
> Can you define the difference between a legitimate bot, and a non legitimate bot for me ?
Well by default every bot is legitimate, an illegitimate bot might be one that’s probing for security vulnerabilities (but I’m not even sure if that’s illegal if you don’t damage the server as a side effect, ie if you only try to determine the Wordpress or SSHD version running on the server for example).
> The OP didn't mention it, but if we can assume they have SOME form of robots.txt (safe assumtion given their history), would those bots who ignored the robots be considered legitimate/non-legitimate ?
robots.txt isn’t legally binding so I don’t think ignoring it makes a bot illegitimate.
> Almost final question, and I know we're not lawyers here, but is there any precedent in case law or anywhere, which defines a 'bad bot' in the eyes of the law ?
There might be but I don’t know any.
> Final final question, as a bot, do you believe you have a right or a privilege to scrape a website ?
Well I’m not a bot but I think I have the right to build bots to scrape websites (and not get served malicious content designed to sabotage my computer). You can decline service and just serve error pages of course if you don’t like my bot.
brudgers · 18h ago
Anyone can sue anyone for anything and the side with the most money is most likely to prevail.
pessimizer · 16h ago
Mantrapping is a fairly good analogy, and that's very illegal. If the person reading your gas meter gets caught in your mantrap, you're going to prison. You're probably going to prison if somebody burglarizing you gets caught in your mantrap.
Of course their computers will live, but if you accidentally take down your own ISP or maybe some third-party service that you use for something, I'd think they would sue you.
bauruine · 20h ago
>User-agent: *
>Disallow: /zipbomb.html
Legitimate crawlers would skip it this way only scum ignores robots.txt
echoangle · 19h ago
I’m not sure that’s enough, robots.txt isn’t really legally binding so if the zip bomb somehow would be illegal, guarding it behind a robots.txt rule probably wouldn’t make it fine.
boricj · 18h ago
> robots.txt isn’t really legally binding
Neither is the HTTP specification. Nothing is stopping you from running a Gopher server on TCP port 80, should you get into trouble if it happens to crash a particular crawler?
Making a HTTP request on a random server is like uttering a sentence to a random person in a city: some can be helpful, some may tell you to piss off and some might shank you. If you don't like the latter, then maybe don't go around screaming nonsense loudly to strangers in an unmarked area.
echoangle · 8h ago
The law might stop you from sending specific responses if the only goal is to sabotage the requesting computer. I’m not 100% familiar with US law but I think intentionally sabotaging a computer system would be illegal.
seqizz · 4h ago
I'm also not a lawyer, but wouldn't they dismiss this as a sabotage if the requester is not legally forced to request it in the first place?
echoangle · 3h ago
No, why would they? If I voluntarily request your website, you can’t just reply with a virus that wipes my harddrive. Even though I had the option to not send the request. I didn’t know that you were going to sabotage me before I made the request.
seqizz · 2h ago
Because you requested it? There is no agreement on what or how to serve things, other than standards (your browser expects a valid document on the other side etc).
I just assumed court might say there is a difference between you requesting all guess-able endpoints and find 1 endpoint which will harm your computer (while there was _zero_ reason for you to access that page) and someone putting zipbomb into index.html to intentionally harm everyone.
echoangle · 1h ago
So serving a document exploiting a browser zero day for RCE under a URL that’s discoverable by crawling (because another page links to it) with the intent to harm the client (by deleting local files for example) would be legitimate because the client made a request? That’s ridiculous.
lcnPylGDnU4H9OF · 19h ago
Has any similar case been tried? I'd think that a judge learning the intent of robots.txt and disallow rules is fairly likely to be sympathetic. Seems like it could go either way, I mean. (Jury is probably more a crap-shoot.)
thephyber · 19h ago
Who, running a crawler which violates robots.txt, is going to prosecute/sue the server owner?
The server owner can make an easy case to the jury that it is a booby trap to defend against trespassers.
dspillett · 2h ago
> can make an easy case to the jury that it is a booby trap to defend against trespassers
I don't know of any online cases, but the law in many (most?) places certainly tends to look unfavourably on physical booby-traps. Even in the US states with full-on “stand your ground” legislation and the UK where common law allows for all “reasonable force” in self-defence, booby-traps are usually not considered self-defence or standing ground. Essentially if it can go off automatically rather than being actioned by a person in a defensive action, it isn't self-defence.
> Who […] is going to prosecute/sue the server owner?
Likely none of them. They might though take tit-for-tat action and pull that zipbomb repeatedly to eat your bandwidth, and they likely have more and much cheaper bandwidth than your little site. Best have some technical defences ready for that, as you aren't going to sue them either: they are probably running from a completely different legal jurisdiction and/or the attack will come from a botnet with little or no evidence trail wrt who kicked it off.
eru · 10h ago
The law generally rewards good faith attempts, and robots.txt is an established commercial standard.
crazygringo · 18h ago
> For the most part, when they do, I never hear from them again. Why? Well, that's because they crash right after ingesting the file.
I would have figured the process/server would restart, and restart with your specific URL since that was the last one not completed.
What makes the bots avoid this site in the future? Are they really smart enough to hard-code a rule to check for crashes and avoid those sites in the future?
fdr · 18h ago
Seems like an exponential backoff rule would do the job: I'm sure crashes happen for all sorts of reasons, some of which are bugs in the bot, even on non-adversarial input.
geocrasher · 11h ago
15+ years ago I fought piracy at a company with very well known training materials for a prestigious certification. I'd distribute zip bombs marked as training material filenames. That was fun.
foundzen · 9h ago
It is surprising that it works (I haven't tried it). `Content-Length` had one goal - to ensure data integrity by comparing the response size with this header value. I expect http client to deal with this out of the box, whether gzip or not. Is it not the case? If yes, that changes everything, a lot of servers need priority updates.
Aachen · 9h ago
You don't need to set a content length header, it'll take the page as finished when you close the connection
PeterStuer · 8h ago
"On my server, I've added a middleware that checks if the current request is malicious or not"
How accurate is that middleware? Obviously there are false negatives as you supplement with other heuristics. What about false positives? Just collateral damage?
thrwyep · 5h ago
I thought he maintains his own list of offenders
PeterStuer · 52m ago
The code shows both the 'middleware' and the custom list can put you in the naughty box
Ey7NFZ3P0nzAe · 9h ago
If anyone is interested in writing a guide to set this up with crowdsec or fail2ban I'm all ears
zzo38computer · 1d ago
I also had the idea of zip bomb to confuse badly behaved scrapers (and I have mentioned it before to some other people, although I did not implemented it). However, maybe instead of 0x00, you might use a different byte value.
I had other ideas too, but I don't know how well some of them will work (they might depend on what bots they are).
ycombinatrix · 1d ago
The different byte values likely won't compress as well as all 0s unless they are a repeating pattern of blocks.
An alternative might be to use Brotli which has a static dictionary. Maybe that can be used to achieve a high compression ratio.
dspillett · 28m ago
Compressing a sequence of any single character should give almost identical results length-wise (perhaps not exactly identical, but the difference will be vanishingly small).
Two bytes difference for a 1GiB sequence of “aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa…” (\141)
compared to a sequence of \000.
zzo38computer · 22h ago
I meant that all of the byte values would be the same (so they would still be repeating), but a different value than zero. However, Brotli could be another idea if the client supports it.
welder · 8h ago
I like a similar trick, sending very large files hosted on external servers to malicious visitors using proxies. Usually those proxies charge by bandwidth, so it increases their costs.
guardian5x · 10h ago
I guess it goes without saying, that the first thing should be to follow security best practices. Patch vulnerabilities fast etc., before doing things like that. Then maybe his first website wouldn't have compromised either.
tonyhart7 · 2h ago
ok but where I put this?? at the files directory???
sgc · 20h ago
I am ignorant as to how most bots work. Could you have a second line of defense for bots that avoid this bomb: Dynamically generate a file from /dev/random and trickle stream it to them, or would they just keep spawning parallel requests? They would never finish streaming it, and presumably give up at some point. The idea would be to make it more difficult for them to detect it was never going to be valid content.
jerf · 20h ago
You want to consider the ratio of your resource consumption to their resource consumption. If you trickle bytes from /dev/random, you are holding open a TCP connection with some minimal overhead, and that's about what they are doing too. Let's assume they are bright enough to use any of the many modern languages or frameworks that can easily handle 10K/100K connections or more on a modern system. They aren't all that bright but certainly some are. You're basically consuming your resources to their resources 1:1. That's not a winning scenario for you.
The gzip bomb means you serve 10MB but they try to consume vast quantities of RAM on their end and likely crash. Much better ratio.
3np · 20h ago
Also might open up a new DoS vector on entropy consumed by /dev/random so it can be worse than 1:1.
jabl · 6h ago
As mentioned, not really an issue on a modern system. But in any case, you could just read, say, 1K from /dev/urandom into a buffer and then keep resending that buffer over and over again?
gkbrk · 14h ago
Entropy doesn't really get "consumed" on modern systems. You can read terabytes from /dev/random without running out of anything.
sgc · 20h ago
That's clear. It all comes down to their behavior. Will they sit there waiting to finish this download, or just start sending other requests in parallel until you dos yourself? My hope is they would flag the site as low-value and go looking elsewhere, on another site.
For HTTP/1.1 you could send a "chunked" response. Chunked responses are intended to allow the server to start sending dynamically generated content immediately instead of waiting for the generation process to finish before sending. You could just continue to send chunks until the client gives up or crashes.
The idea is to trickle it very slowly, like keeping a cat occupied with a ball of fluff in the corner.
uniqueuid · 20h ago
Cats also have timeouts set for balls of fluff. They usually get bored at some point and either go away or attack you :)
jeroenhd · 18h ago
If the bot is connecting over IPv4, you only have a couple thousand connections before your server starts needing to mess with shared sockets and other annoying connectivity tricks.
I don't think it's a terrible problem to solve these days, especially if you use one of the tarpitting implementations that use nftables/iptables/eBPF, but if you have one of those annoying Chinese bot farms with thousands of IP addresses hitting your server in turn (Huawei likes to do this), you may need to think twice before deploying this solution.
stavros · 16h ago
Yes but you still need to keep a connection open to them. This is a sort of reverse SlowLoris attack, though.
dredmorbius · 3h ago
You've got the option of abandoning the connection at any time should resources be needed elsewhere.
(Or rather, the tarpit should be programmed to do this, whether by having a maximum resource allocation or monitoring free system resources.)
CydeWeys · 20h ago
Yeah but in the mean time it's tying up a connection on your webserver.
thehappypm · 13h ago
This would work, but at times bots pretend not to be bots, so you occasionally do this to a real user
uniqueuid · 20h ago
Practically all standard libraries have timeouts set for such requests, unless you are explicitly offering streams which they would skip.
vivzkestrel · 13h ago
"But when I detect that they are either trying to inject malicious attacks, or are probing for a response" how are you detecting this? mind sharing some pseudocode?
Do you mind sharing your specs of your digital ocean droplet? I'm trying to setup one with less cost.
foxfired · 19h ago
The blog runs on a $6 digital ocean droplet. It's 1GB RAM and 25GB storage. There is a link at the end of the article on how it handles typical HN traffic. Currently at 5% CPU.
mightyrabbit99 · 7h ago
OP: Hi guys this is how I fend off hackers!
Hackers: Note taken.
nottorp · 9h ago
But what about the bots written in Rust? Will that get rid of them too?
dspillett · 2h ago
Rust born processes are memory-safe in terms of avoiding corruption of their heaps & stacks by C-like problems like rogue pointers and use-after-free, but they are still subject to OOM conditions, or running out of other storage, so can easily be killed by a zip-bomb if not coded in an appropriately defensive manner.
_QrE · 20h ago
There's a lot of creative ideas out there for banning and/or harassing bots. There's tarpits, infinite labyrinths, proof of work || regular challenges, honeypots etc.
Most of the bots I've come across are fairly dumb however, and those are pretty easy to detect & block. I usually use CrowdSec (https://www.crowdsec.net/), and with it you also get to ban the IPs that misbehave on all the other servers that use it before they come to yours. I've also tried turnstile for web pages (https://www.cloudflare.com/application-services/products/tur...) and it seems to work, though I imagine most such products would, as again most bots tend to be fairly dumb.
I'd personally hesitate to do something like serving a zip bomb since it would probably cost the bot farm(s) less than it would cost me, and just banning the IP I feel would serve me better than trying to play with it, especially if I know it's misbehaving.
Edit: Of course, the author could state that the satisfaction of seeing an IP 'go quiet' for a bit is priceless - no arguing against that
goodboyjojo · 5h ago
this was a cool read.very interesting stuff.
InDubioProRubio · 5h ago
If one wanted to create the ICE of cyberspace in cyberpunk, capable to destroy the device ...
OutOfHere · 4h ago
Serving a zip bomb is pretty illegal. The bot will restart its process anyway, and carry on as if nothing happened.
java-man · 1d ago
I think it's a good idea, but it must be coupled with robots.txt.
forinti · 19h ago
I was looking through my logs yesterday.
Bad bots don't even read robots.txt.
extraduder_ire · 3h ago
The worst ones treat it as a target.
cratermoon · 1d ago
AI scraper bots don't respect robots.txt
jsheard · 1d ago
I think that's the point, you'd use robots.txt to direct Googlebot/Bingbot/etc away from countermeasures that could potentially mess up your SEO. If other bots ignore the signpost clearly saying not to enter the tarpit, that's their own stupid fault.
reverendsteveii · 20h ago
The ones that survive do
cantrecallmypwd · 16h ago
Wouldn't it be cheaper to use Cloudflare than task a human to obsessively watch webserver logs on a box lacking proper filtering?
gkbrk · 14h ago
It's also cheaper to search Google Images for "Eiffel tower" than booking a flight to Paris and going there, but a lot of people enjoy doing the latter.
charcircuit · 13h ago
Many people would be better off sticking with the former than realizing what Paris actually is and being disappointed.
I had this in mind when visiting Paris and was pleasantly surprised. Lovely and beautiful city.
And to heck with cloudflare :S We don't need 3 companies controlling every part of the internet.
harrison_clarke · 20h ago
it'd be cool to have a proof of work protocol baked into http. like, a header that browsers understood
d--b · 20h ago
Zip libraries aren’t bomb proof yet? Seems fairly easy to detect and ignore, no?
cynicalsecurity · 19h ago
This topic comes up from time to time and I'm surprised no one yet mentioned the usual fearmongering rhetoric of zip bombs being potentially illegal.
I'm not a lawyer, but I'm yet to see a real life court case of a bot owner suing a company or an individual for responding to his malicious request with a zip bomb. The usual spiel goes like this: responding to his malicious request with a malicious response makes you a cybercriminal and allows him (the real cybercriminal) to sue you. Again, except of cheap talk I've never heard of a single court case like this. But I can easily imagine them trying to blackmail someone with such cheap threats.
I cannot imagine a big company like Microsoft or Apple using zip bombs, but I fail to see why zip bombs would be considered bad in any way. Anyone with an experience of dealing with malicious bots knows the frustration and the amount of time and money they steal from businesses or individuals.
os2warpman · 18h ago
Anyone can sue anyone else for any reason.
This is what trips me up:
>On my server, I've added a middleware that checks if the current request is malicious or not.
There's a lot of trust placed in:
>if (ipIsBlackListed() || isMalicious()) {
Can someone assigned a previously blacklisted IP or someone who uses a tool to archive the website that mimics a bot be served malware? Is the middleware good enough or "good enough so far"?
Close enough to 100% of my internet traffic flows through a VPN. I have been blacklisted by various services upon connecting to a VPN or switching servers on multiple occasions.
immibis · 8h ago
Yes.
A user has to manually unpack a zip bomb, though. They have to open the file and see "uncompressed size: 999999999999999999999999999" and still try to uncompress it, at which point it's their fault when it fills up their drive and fails. So I don't think there's any ethical dilemma there.
wing-_-nuts · 1h ago
For some reason I was under the impression that browsers had the ability to transparently decompress certain archive formats? I may be thinking of less and gzip though
codingdave · 1d ago
Mildly amusing, but it seems like this is thinking that two wrongs make a right, so let us serve malware instead of using a WAF or some other existing solution to the bot problem.
imiric · 20h ago
The web is overrun by malicious actors without any sense of morality. Since playing by the rules is clearly not working, I'm in favor of doing anything in my power to waste their resources. I would go a step further and try to corrupt their devices so that they're unable to continue their abuse, but since that would require considerably more effort from my part, a zip bomb is a good low-effort solution.
bsimpson · 20h ago
There's no ethical ambiguity about serving garbage to malicious traffic.
They made the request. Respond accordingly.
petercooper · 20h ago
Based on the example in the post, that thinking might need to be extended to "someone happening to be using a blocklisted IP." I don't serve up zip bombs, but I've blocklisted many abusive bots using VPN IPs over the years which have then impeded legitimate users of the same VPNs.
joezydeco · 20h ago
This is William Gibson's "black ICE" becoming real, and I love it.
At least, not with the default rules. I read that discussion a few days ago and was surprised how few callouts there were that a WAF is just a part of the infrastructure - it is the rules that people are actually complaining about. I think the problem is that so many apps run on AWS and their default WAF rules have some silly content filtering. And their "security baseline" says that you have to use a WAF and include their default rules, so security teams lock down on those rules without any real thought put into whether or not they make sense for any given scenario.
I did actually try zip bombs at first. They didn't work due to the architecture of how Amazon's scraper works. It just made the requests get retried.
wiredfool · 20h ago
Amazon's scraper has been sending multiple requests per second to my servers for 6+ weeks, and every request has been returned 429.
Amazon's scraper doesn't back off. Meta, google, most of the others with identifiable user agents back off, Amazon doesn't.
toast0 · 20h ago
If it's easy, sleep 30 before returning 429. Or tcpdrop the connections and don't even send a response or a tcp reset.
cratermoon · 3h ago
That's a good way to self-DOS
toast0 · 2h ago
That's why I said, if it's easy. On some server stacks it's no big deal to have a connection open for an extra 30 seconds; others, you need to be done with requests asap, even abuse.
tcpdrop shouldn't self DOS though, it's using less resources. Even if other end does a retry, it will do it after a timeout; in the meantime, the other end has a socket state and you don't, that's a win.
deathanatos · 20h ago
So first, let me prefix this by saying I generally don't accept cookies from websites I don't explicitly first allow, my reasoning being "why am I granting disk read/write access to [mostly] shady actors to allow them to track me?"
(I don't think your blog qualifies as shady … but you're not in my allowlist, either.)
So if I visit https://anubis.techaro.lol/ (from the "Anubis" link), I get an infinite anime cat girl refresh loop — which honestly isn't the worst thing ever?
Neither xeserv.us nor techaro.lol are in my allowlist. Curious that one seems to pass. IDK.
The blog post does have that lovely graph … but I suspect I'll loop around the "no cookie" loop in it, so the infinite cat girls are somewhat expected.
I was working on an extension that would store cookies very ephemerally for the more malicious instances of this, but I think its design would work here too. (In-RAM cookie jar, burns them after, say, 30s. Persisted long enough to load the page.)
xena · 17h ago
You're seeing an experiment in progress. It seems to be working, but I have yet to get enough data to know if it's ultimately successful or not.
cycomanic · 19h ago
Just FYI temporary containers (Firefox extension) seem to be the solution you're looking for. It essentially generates a new container for every tab you open (subtabs can be either new containers or in the same container). Once the tab is closed it destroys the container and deletes all browsing data (including cookies). You can still whitelist some domains to specific persistent containers.
I used cookie blockers for a long time, but always ended up having to whitelist some sites even though I didn't want their cookies because the site would misbehave without them. Now I just stopped worrying.
lcnPylGDnU4H9OF · 19h ago
> Neither xeserv.us nor techaro.lol are in my allowlist. Curious that one seems to pass. IDK.
Is your browser passing a referrer?
cookiengineer · 1d ago
Did you also try Transfer-Encoding: chunked and things like HTTP smuggling to serve different content to web browser instances than to scrapers?
chmod775 · 1d ago
Truly one my favorite thought-terminating proverbs.
"Hurting people is wrong, so you should not defend yourself when attacked."
"Imprisoning people is wrong, so we should not imprison thieves."
Also the modern telling of Robin Hood seems to be pretty generally celebrated.
Two wrongs may not make a right, but often enough a smaller wrong is the best recourse we have to avert a greater wrong.
The spirit of the proverb is referring to wrongs which are unrelated to one another, especially when using one to excuse another.
cantrecallmypwd · 16h ago
> "Hurting people is wrong, so you should not defend yourself when attacked."
This is exactly what Californian educators told kids who were being bullied in the 90's.
zdragnar · 20h ago
> a smaller wrong is the best recourse we have to avert a greater wrong
The logic of terrorists and war criminals everywhere.
impulsivepuppet · 18h ago
I admire your deontological zealotry. That said, I think there is an implied virtuous aspect of "internet vigilantism" that feels ignored (i.e. disabling a malicious bot means it does not visit other sites) While I do not absolve anyone from taking full responsibility for their actions, I have a suspicion that terrorists do a bit more than just avert a greater wrong--otherwise, please sign me up!
I tried to contact the admin of the box (yeah that’s what people used to do) and got nowhere. Eventually I sent a message saying “hey I see your machine trying to connect every few seconds on port <whatever it is>. I’m just sending a heads up that we’re starting a new service on that port and I want to make sure it doesn’t cause you any problems.”
Of course I didn’t hear back. Then I set up a server on that port that basically read from /dev/urandom, set TCP_NODELAY and a few other flags and pushed out random gibberish as fast as possible. I figured the clients of this service might not want their strings of randomness to be null-terminated so I thoughtfully removed any nulls that might otherwise naturally occur. The misconfigured NT box connected, drank 5 seconds or so worth of randomness, then disappeared. Then 5 minutes later, reappeared, connected, took its buffer overflow medicine and disappeared again. And this pattern then continued for a few weeks until the box disappeared from the internet completely.
I like to imagine that some admin was just sitting there scratching his head wondering why his NT box kept rebooting.
You can also limit the wider process or system your request is part of.
Time limits tend to also defacto limit size, if bandwidth is somewhat constrained.
Timeouts and size limits are trivial to update as legitimate need is discovered.
Practically speaking, putting an arbitrary size limit somewhere is like putting yet-another-ssl-cert-that-needs-to-be-renewed in some critical system. It will eventually cause an outage you aren’t expecting.
Will there be a plausible someone to blame? Of course. Realistically, it was also inevitable someone would forget and run right into it.
Time limits tend to not have this issue, for various reasons.
Pretty neat.
I had a lazy fix for a down detection on my RPi server at home, it was pinging a domain I owned and if it couldn't hit that assumed it wasn't connected to a network/rebooted itself. I let the domain lapse and this RPi kept going down around 5 minutes... thought it was a power fault, then I remembered about that CRON job.
Later on, browsers started to check for actual content I think, and would abort such requests.
Years later I was finally able to open it.
Among things that didn't work were qutebrowser, icecat, nsxiv, feh, imv, mpv. I did worry at first the file was corrupt, I was redownloading it, comparing hashes with a friend, etc. Makes for an interesting benchmark, I guess.
For others curious, here's the file: https://0x0.st/82Ap.png
I'd say just curl/wget it, don't expect it to load in a browser.
Takes a few seconds, but otherwise seems pretty ok in desktop Safari. Preview.app also handles it fine (albeit does allocate an extra ~1-2GB of RAM)
Old school acdsee would have been fine too.
I think it's all the pixel processing on the modern image viewers (or they're just using system web views that isn't 100% just a straight render).
I suspect that the more native renderers are doing some extra magic here. Or just being significantly more OK with using up all your ram.
It also pans and zooms swiftly
Partially zoomed in was fine, but zooming to maximum fidelity resulted in the tab crashing (it was completely responsive until the crash). Looks like Safari does some pretty smart progressive rendering, but forcing it to render the image at full resolution (by zooming in) causes the render to get OOMed or similar.
Pan&zoom works instantly with a blurry preview and then takes another 5-10s to render completely.
I suggested to try the HN beloved Sumatra PDF. Ugh, it couldn't cope with it normally. Chrome did it better coped better.
Surprisingly, Windows 95 didn't die trying to load it, but quite a lot of operations in the system took noticeably longer than they normally did.
Any ideeas?
yes "<div>"|dd bs=1M count=10240 iflag=fullblock|gzip | pv > zipdiv.gz
Resulting file is about 15 mib long and uncompresses into a 10 gib monstrosity containing 1789569706 unclosed nested divs
Also you can reverse many DoD vectors depending on how you are setup and costs. For example reverse Slowloris attack and use up their connections.
I think this was it:
https://freedomhacker.net/annoying-favicon-crash-bug-firefox...
Ok, not a real zip bomb, for that we would need a kernel module.
Write an ordinary static html page and fill a <p> with infinite random data using <!--#include file="/dev/random"-->.
or would that crash the server?
I am not sure how that could’ve worked. Unless the real /dev tree was exposed to your webserver’s chroot environment, this would’ve given nothing special except “file not found”.
The whole point of chroot for a webserver was to shield clients from accessing special files like that!
Even if you knew it was done with a symlink you don't know that - these days odds are it'd run in a container or vm, and so having access to /dev/zero means very little.
https://medium.com/@bishr_tabbaa/when-smart-ships-divide-by-...
"On 21 September 1997, the USS Yorktown halted for almost three hours during training maneuvers off the coast of Cape Charles, Virginia due to a divide-by-zero error in a database application that propagated throughout the ship’s control systems."
" technician tried to digitally calibrate and reset the fuel valve by entering a 0 value for one of the valve’s component properties into the SMCS Remote Database Manager (RDM)"
https://www.google.com/search?q=windows+nt+bug+affects+ship
Though, bots may not support modern compression standards. Then again, that may be a good way to block bots: every modern browser supports zstd, so just force that on non-whitelisted browser agents and you automatically confuse scrapers.
[1] checkboxes demo https://checkboxes.andersmurphy.com
[2] article on brotli SSE https://andersmurphy.com/2025/04/15/why-you-should-use-brotl...
it is basically a quine.
How bad the tab process dying is, depends per browser. If your browser does site isolation well, it'll only crash that one website and you'll barely notice. If that process is shared between other tabs, you might lose state there. Chrome should be fine, Firefox might not be depending on your settings and how many tabs you have open, with Safari it kind of depends on how the tabs were opened and how the browser is configured. Safari doesn't support zstd though, so brotli bombs are the best you can do with that.
I know it's slightly off topic, but it's just so amusing (edit: reassuring) to know I'm not the only one who, after 1 hour of setting up Wordpress there's a PHP shell magically deployed on my server.
>Oh look 3 separate php shells with random strings as a name
Never less than 3, but always guaranteed.
But it's such a bad platform that there really isn't any reason for anybody to use WordPress for anything. No matter your use case, there will be a better alternative to WordPress.
I've tried Drupal in the past for such situations, but it was too complicated for them. That was years ago, so maybe it's better now.
> new
Pretty sure Drupal has been around for like, 20 years or so. Or is this a different Drupal?
It appears Drupal CMS is a customized version of Drupal that is easier for less tech-savvy folks to get up and running. At least, that's the impression I got reading through the marketing hype that "explains" it with nothing but buzzwords.
25 years ago we used Microsoft Frontpage for that, with the web root mapped to a file share that the non-technical secretary could write to and edit it as if it were a word processor.
Somehow I feel we have regressed from that simplicity, with nothing but hand waving to make up for it. This method was declared "obsolete" and ... Wordpress kludges took its place as somehow "better". Someone prove me wrong.
The other part is clients freaking out after Frontpage had a series of dangerous CVEs all in a row.
And then finally every time a part of Frontpage got popular, MS would deprecate the API and replace it with a new one.
Wordpress was in the right place at the right time.
Could be automated better (drop ZIP to a share somewhere where it gets processed and deployed) but best of both worlds.
And only hosted option for the copyrighted code starts at 300/y
these don't cover any use case people use WordPress for.
No comments yet
- very hard to hack because we pre render all assets to a Cloudflare kv store
- public website and CMS editor are on different domains
Basically very hard to hack. Also as a bonus is much more reliable as it will only go down when Cloudflare does.
[0] https://decapcms.org/
In one, multiple users can login, edit WYSIWYG, preview, add images, etc, all from one UI. You can access it from any browser including smart phones and tablets.
In the other, you get to instruct users on git, how to deal with merge conflicts, code review (two people can't easily work on a post like they can in wordpress), previews require a manual build, you need a local checkout and local build installation to do the build. There no WYSIWYG, adding images is a manual process of copying a file, figuring out the URL, etc... No smartphone/tablet support. etc....
I switched by blog from wordpress install to a static site geneator because I got tired of having to keep it up to date but my posting dropped because of friction of posting went way up. I could no longer post from a phone. I couldn't easily add images. I had to build to preview. And had to submit via git commits and pushes. All of that meant what was easy became tedious.
For example (not affiliated with them) https://www.siteleaf.com/
IIRC, Eleventy printed lots of out-of-date warnings when I installed it and/or the default style was broken in various ways which didn't give me much confidence.
My younger sister asked me to help her start a blog. I just pointed her to substack. Zero effort, easy for her.
I build mine with GitHub Actions and host it free on Pages.
Edit: I actually feel a bit sorry for the SurrealCMS developer. He has a fantastic product that should be an industry standard, but it's fairly unknown.
Then WordPress is just your private CMS/UI for making changes, and it generates static files that are uploaded to a webhost like CloudFlare Pages, GitHub Pages, etc.
Now that plugin became a service, at which point you might just use a WP host and let them do their thing.
If they are selling anything on their website, it's probably going to be through a cloud hosted third party service and then it's just an embedded iframe on their website.
If you're making an entire web shop for a very large enterprise or something of similar magnitude, then you have to ask somebody else than me.
Everything I've built in the past like 5 years has been almost entirely pure ES6 with some helpers like jsviews.
https://survey.stackoverflow.co/2024/technology#1-web-framew...
https://youmightnotneedjquery.com/
There's a few plugins that do this, but vanilla WP is dangerous.
I've used this teaching folks devops, here deploy your first hello world nginx server... huh what are those strange requests in the log?
Edit: And for folks who write their own web pages, you can always create zip bombs that are links on a web page that don't show up for humans (white text on white background with no highlight on hover/click anchors). Bots download those things to have a look (so do crawlers and AI scrapers)
I did a version of this with my form for requesting an account on my fediverse server. The problem I was having is that there exist these very unsophisticated bots that crawl the web and submit their very unsophisticated spam into every form they see that looks like it might publish it somewhere.
First I added a simple captcha with distorted characters. This did stop many of the bots, but not all of them. Then, after reading the server log, I noticed that they only make three requests in a rapid succession: the page that contains the form, the captcha image, and then the POST request with the form data. They don't load neither the CSS nor the JS.
So I added several more fields to the form and hid them with CSS. Submitting anything in these fields will fail the request and ban your session. I also modified the captcha, I made the image itself a CSS background, and made the src point to a transparent image instead.
And just like that, spam has completely stopped, while real users noticed nothing.
RIP screen reader users?
https://github.com/skeeto/endlessh
This is the main reason I haven't installed zip bombs on my website already -- on the off chance I'd make someone angry and end up having to fend off a DDoS.
Currently I have some URL patterns to which I'll return 418 with no content, just to save network / processing time (since if a real user encounters a 404 legitimately, I want it to have a nice webpage for them to look at).
Should probably figure out how to wire that into fail2ban or something, but not a priority at the moment.
Automated banning is harder, you'd probably want a heuristic system and look up info on IPs.
IPv4 with NAT means you can "overban" too.
It's also not a common metric you can filter on in open firewalls since you must lookup and maintain a cache of IP to ASN, which has to be evicted and updated as blocks still move around.
The practical effect of this was you could place a zip bomb in an office xml document and this product would pass the ooxml file through even if it contained easily identifiable malware.
The file size problem is still an issue for many big name EDRs.
Scanning them are resources intensive. The choice are (1) skip scanning them; (2) treat them as malware; (3) scan them and be DoS'ed.
(deferring the decision to human iss effectively DoS'ing your IT support team)
It's not working very well.
In the web server log, I can see that the bots are not downloading the whole ten megabyte poison pill.
They are cutting off at various lengths. I haven't seen anything fetch more than around 1.5 Mb of it so far.
Or is it working? Are they decoding it on the fly as a stream, and then crashing? E.g. if something is recorded as having read 1.5 Mb, could it have decoded it to 1.5 Gb in RAM, on the fly, and crashed?
There is no way to tell.
PS: I'm on the bots side, but don't mind helping.
I've noticed that LLM scrapers tend to be incredibly patient. They'll wait for minutes for even small amounts of text.
Anyway, from bots perspective labyrinths aren't the main problem. Internet is being flooded with quality LLM-generated content.
Many of these are annoying LLM training/scraping bots (in my case anyway). So while it might not crash them if you spit out a 800KB zipbomb, at least it will waste computing resources on their end.
Secondly, I know that most of these bots do not come back. The attacks do not reuse addresses against the same server in order to evade almost any conceivable filter rule that is predicated on a prior visit.
> as soon as an IP address is logged as having visited the trap URL (honeypot, or zipbomb or whatever), a log monitoring script bans that client.
Is this not why they aren’t getting the full file?
[0] https://www.bamsoftware.com/hacks/zipbomb/ [1] https://www.bamsoftware.com/hacks/zipbomb/#safebrowsing
10T is probably overkill though.
Think about it:
Other than that, why serve gzip anyway? I would not set the Content-Length Header and throttle the connection and set the MIME type to something random, hell just octet-stream, and redirect to '/dev/random'.I don't get the 'zip bomb' concept, all you are doing is compressing zeros. Why not compress '/dev/random'? You'll get a much larger file, and if the bot receives it, it'll have a lot more CPU cycles to churn.
Even the OP article states that after creating the '10GB.gzip' that 'The resulting file is 10MB in this case.'.
Is it because it sounds big?
Here is how you don't waste time with 'zip bombs':
The compression ratio is the whole point... if you can send something small for next to no $$ which causes the receiver to crash due to RAM, storage, compute, etc constraints, you win.
cgroups with hard-limits will let the external tool's process crash without taking down the script or system along with it.
This is exactly the same idea as partitioning, though.
In a practical sense, how's that different from creating a N-byte partition and letting the OS return ENOSPC to you?
I'm sure though that if it was as simples as that we wouldn't even have a name for it.
It's just nobody usually implements a limit during decompression because people aren't usually giving you zip bombs. And sometimes you really do want to decompress ginormous files, so limits aren't built in by default.
Your given language might not make it easy to do, but you should pretty much always be able to hack something together using file streams. It's just an extra step is all.
Also zip bombs are not comically large until you unzip them.
Also you can just unpack any sort of compressed file format without giving any thought to whether you are handling it safely.
For all those "eagerly" fishing for content AI bots I ponder if I should set up a Markov chain to generate semi-legible text in the style of the classic https://en.wikipedia.org/wiki/Mark_V._Shaney ...
https://www.hackerfactor.com/blog/index.php?/archives/762-At...
> A well-optimized, lightweight setup beats expensive infrastructure. With proper caching, a $6/month server can withstand tens of thousands of hits — no need for Kubernetes.
----
[1] Though doing this in order to play/learn/practise is, of course, understandable.
Signed, a kid in the 90s who downloaded some "wavelet compression" program from a BBS because it promised to compress all his WaReZ even more so he could then fit moar on his disk. He ran the compressor and hey golly that 500MB ISO fit into only 10MB of disk now! He found out later (after a defrag) that the "compressor" was just hiding data in unused disk sectors and storing references to them. He then learned about Shannon entropy from comp.compression.research and was enlightened.
So you could access the files until you wrote more data to disk?
Brings to mind this 30+ year old IOCCC entry for compressing C code by storing the code in the file names.
https://www.ioccc.org/1993/lmfjyh/index.html
I tried this on my computer with a couple of other tools, after creating a file full of 0s as per the article.
gzip -9 turns it into 10,436,266 bytes in approx 1 minute.
xz -9 turns it into 1,568,052 bytes in approx 4 minutes.
bzip2 -9 turns it into 7,506 (!) bytes in approx 5 minutes.
I think OP should consider getting bzip2 on the case. 2 TBytes of 0s should compress nicely. And I'm long overdue an upgrade to my laptop... you probably won't be waiting long for the result on anything modern.
As far as I can tell, the biggest amplification you can get out of zstd is 32768 times: per the standard, the maximum decompressed block size is 128KiB, and the smallest compressed block is a 3-byte header followed by a 1-byte block (e.g. run-length-encoded). Indeed, compressing a 1GiB file of zeroes yields 32.9KiB of output, which is quite close to that theoretical maximum.
Brotli promises to allow for blocks that decompress up to 16 MiB, so that actually can exceed the compression ratios that bzip2 gives you on that particular input. Compressing that same 1 GiB file with `brotli -9` gives an 809-byte output. If I instead opt for a 16 GiB file (dd if=/dev/zero of=/dev/stdout bs=4M count=4096 | brotli -9 -o zeroes.br), the corresponding output is 12929 bytes, for a compression ratio of about 1.3 million; theoretically this should be able to scale another 2x, but whether that actually plays out in practice is a different matter.
(The best compression for brotli should be available at -q 11, which is the default, but it's substantially slower to compress compared to `brotli -9`. I haven't worked out exactly what the theoretical compression ratio upper bound is for brotli, but it's somewhere between 1.3 and 2.8 million.)
Also note that zstd provides very good compression ratios for its speed, so in practice most use cases benefit from using zstd.
But a gzip decompressor is not turing-complete, and there are no gzip streams that will expand to infinitely large outputs, so it is theoretically possible to find the pseudo-Kolmogorov-Complexity of a string for a given decompressor program by the following algorithm:
Let file.bin be a file containing the input byte sequence.
1. BOUNDS=$(gzip --best -c file.bin | wc -c)
2. LENGTH=1
3. If LENGTH==BOUNDS, run `gzip --best -o test.bin.gz file.bin` and HALT.
4. Generate a file `test.bin.gz` LENGTH bytes long containing all zero bits.
5. Run `gunzip -k test.bin.gz`.
6. If `test.bin` equals `file.bin`, halt.
7. If `test.bin.gz` contains only 1 bits, increment LENGTH and GOTO 3.
8. Replace test.bin.gz with its lexicographic successor by interpreting it as a LENGTH-byte unsigned integer and incrementing it by 1.
9. GOTO 5.
test.bin.gz contains your minimal gzip encoding.
There are "stronger" compressors for popular compression libraries like zlib that outperform the "best" options available, but none of them are this exhaustive because you can surely see how the problem rapidly becomes intractable.
For the purposes of generating an efficient zip bomb, though, it doesn't really matter what the exact contents of the output file are. If your goal is simply to get the best compression ratio, you could enumerate all possible files with that algorithm (up to the bounds established by compressing all zeroes to reach your target decompressed size, which makes a good starting point) and then just check for a decompressed length that meets or exceeds the target size.
I think I'll do that. I'll leave it running for a couple days and see if I can generate a neat zip bomb that beats compressing a stream of zeroes. I'm expecting the answer is "no, the search space is far too large."
I would need to selectively generate grammatically valid zstd streams for this to be tractable at all.
There's literally no machine on Earth today that can deal with that (as a single file, I mean).
Oh? Certainly not in RAM, but 4 PiB is about 125x 36TiB drives (or 188x 24TiB drives). (You can go bigger if you want to shell out tens of thousands per 100TB SSD, at which point you "only" need 45 of those drives.)
These are numbers such that a purpose-built server with enough SAS expanders could easily fit that within a single rack, for less than $100k (based on the list price of an Exos X24 before even considering any bulk discounts).
42.zip has five layers. But you can make a zip file that has an infinite number of layers. See https://research.swtch.com/zip or https://alf.nu/ZipQuine
All depends on how much magic you want to shove into an "algorithm"
126, surely?
Rather not write it myself
Here's a start hacked together and tested on my phone:
Surely, the device does crash but it isn’t destroyed?
Like, a legitimate crawler suing you and alleging that you broke something of theirs?
The CFAA[1] prohibits:
> knowingly causes the transmission of a program, information, code, or command, and as a result of such conduct, intentionally causes damage without authorization, to a protected computer;
As far as I can tell (again, IANAL) there isn't an exception if you believe said computer is actively attempting to abuse your system[2]. I'm not sure if a zip bomb would constitute intentional damage, but it is at least close enough to the line that I wouldn't feel comfortable risking it.
[1]: https://www.law.cornell.edu/uscode/text/18/1030
[2]: And of course, you might make a mistake and incorrectly serve this to legitimate traffic.
> which is used in or affecting interstate or foreign commerce or communication, including a computer located outside the United States that is used in a manner that affects interstate or foreign commerce or communication of the United States
Assuming the server is running in the states, I think that would apply unless the client is in the same state as the server, in which case there is probably similar state law that comes into affect. I don't see anything there that excludes a client, and that makes sense, because otherwise it wouldn't prohibit having a site that tricks people into downloading malware.
Also, the protected computer has to be involved in commerce. Unless they are accessing the website with the zip bomb using a computer that also is uses for interstate or foreign commerce, it won't qualify.
In the US, virtually everything is involved in 'interstate commerce'. See https://en.wikipedia.org/wiki/Commerce_Clause
> The Commerce Clause is the source of federal drug prohibition laws under the Controlled Substances Act. In a 2005 medical marijuana case, Gonzales v. Raich, the U.S. Supreme Court rejected the argument that the ban on growing medical marijuana for personal use exceeded the powers of Congress under the Commerce Clause. Even if no goods were sold or transported across state lines, the Court found that there could be an indirect effect on interstate commerce and relied heavily on a New Deal case, Wickard v. Filburn, which held that the government may regulate personal cultivation and consumption of crops because the aggregate effect of individual consumption could have an indirect effect on interstate commerce.
So what? It isn't in the section I quoted above. I could be wrong, but my reading is that transmitting information that can cause damage with the intent of causing damage is a violation, regardless of if you "access" another system.
> Also, the protected computer has to be involved in commerce
Or communication.
Now, from an ethics standpoint, I don't think there is anything wrong with returning a zipbomb to malicious bots. But I'm not confident enough that doing so is legal that I would risk doing so.
You can't read laws in sections like that. They sections go together. The entire law is about causing damage through malicious access. But servers don't access clients.
The section you quoted isn't relevant because the entire law is about clients accessing servers, not servers responding to clients.
In particular, the interstate commerce clause is very over-reaching. It's been ruled that someone who grew their own crops to feed to their own farm animals sold locally was conducting interstate commerce because they didn't have to buy them from another state.
(I'm half-joking, half-crying. It's how everything else works, basically. Why would it not work here? You could even go as far as explicitly calling it a "zipbomb test delivery service". It's not your fault those bots have no understanding what they're connecting to…)
But if it matters pay your lawyer and if it doesn’t matter, it doesn’t matter.
I'll play the side of the defender and you can play the "bot"/bot deployer.
> it could serve the zip bomb to a legitimate bot.
Can you define the difference between a legitimate bot, and a non legitimate bot for me ?
The OP didn't mention it, but if we can assume they have SOME form of robots.txt (safe assumtion given their history), would those bots who ignored the robots be considered legitimate/non-legitimate ?
Almost final question, and I know we're not lawyers here, but is there any precedent in case law or anywhere, which defines a 'bad bot' in the eyes of the law ?
Final final question, as a bot, do you believe you have a right or a privilege to scrape a website ?
Well by default every bot is legitimate, an illegitimate bot might be one that’s probing for security vulnerabilities (but I’m not even sure if that’s illegal if you don’t damage the server as a side effect, ie if you only try to determine the Wordpress or SSHD version running on the server for example).
> The OP didn't mention it, but if we can assume they have SOME form of robots.txt (safe assumtion given their history), would those bots who ignored the robots be considered legitimate/non-legitimate ?
robots.txt isn’t legally binding so I don’t think ignoring it makes a bot illegitimate.
> Almost final question, and I know we're not lawyers here, but is there any precedent in case law or anywhere, which defines a 'bad bot' in the eyes of the law ?
There might be but I don’t know any.
> Final final question, as a bot, do you believe you have a right or a privilege to scrape a website ?
Well I’m not a bot but I think I have the right to build bots to scrape websites (and not get served malicious content designed to sabotage my computer). You can decline service and just serve error pages of course if you don’t like my bot.
https://en.wikipedia.org/wiki/Mantrap_(snare)
Of course their computers will live, but if you accidentally take down your own ISP or maybe some third-party service that you use for something, I'd think they would sue you.
>Disallow: /zipbomb.html
Legitimate crawlers would skip it this way only scum ignores robots.txt
Neither is the HTTP specification. Nothing is stopping you from running a Gopher server on TCP port 80, should you get into trouble if it happens to crash a particular crawler?
Making a HTTP request on a random server is like uttering a sentence to a random person in a city: some can be helpful, some may tell you to piss off and some might shank you. If you don't like the latter, then maybe don't go around screaming nonsense loudly to strangers in an unmarked area.
I just assumed court might say there is a difference between you requesting all guess-able endpoints and find 1 endpoint which will harm your computer (while there was _zero_ reason for you to access that page) and someone putting zipbomb into index.html to intentionally harm everyone.
The server owner can make an easy case to the jury that it is a booby trap to defend against trespassers.
I don't know of any online cases, but the law in many (most?) places certainly tends to look unfavourably on physical booby-traps. Even in the US states with full-on “stand your ground” legislation and the UK where common law allows for all “reasonable force” in self-defence, booby-traps are usually not considered self-defence or standing ground. Essentially if it can go off automatically rather than being actioned by a person in a defensive action, it isn't self-defence.
> Who […] is going to prosecute/sue the server owner?
Likely none of them. They might though take tit-for-tat action and pull that zipbomb repeatedly to eat your bandwidth, and they likely have more and much cheaper bandwidth than your little site. Best have some technical defences ready for that, as you aren't going to sue them either: they are probably running from a completely different legal jurisdiction and/or the attack will come from a botnet with little or no evidence trail wrt who kicked it off.
I would have figured the process/server would restart, and restart with your specific URL since that was the last one not completed.
What makes the bots avoid this site in the future? Are they really smart enough to hard-code a rule to check for crashes and avoid those sites in the future?
How accurate is that middleware? Obviously there are false negatives as you supplement with other heuristics. What about false positives? Just collateral damage?
I had other ideas too, but I don't know how well some of them will work (they might depend on what bots they are).
An alternative might be to use Brotli which has a static dictionary. Maybe that can be used to achieve a high compression ratio.
For example, with gzip:
Two bytes difference for a 1GiB sequence of “aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa…” (\141) compared to a sequence of \000.The gzip bomb means you serve 10MB but they try to consume vast quantities of RAM on their end and likely crash. Much better ratio.
See https://en.wikipedia.org/wiki/Slowloris_(cyber_attack)
[0]: https://en.wikipedia.org/wiki/Chunked_transfer_encoding
I don't think it's a terrible problem to solve these days, especially if you use one of the tarpitting implementations that use nftables/iptables/eBPF, but if you have one of those annoying Chinese bot farms with thousands of IP addresses hitting your server in turn (Huawei likes to do this), you may need to think twice before deploying this solution.
(Or rather, the tarpit should be programmed to do this, whether by having a maximum resource allocation or monitoring free system resources.)
Most of the bots I've come across are fairly dumb however, and those are pretty easy to detect & block. I usually use CrowdSec (https://www.crowdsec.net/), and with it you also get to ban the IPs that misbehave on all the other servers that use it before they come to yours. I've also tried turnstile for web pages (https://www.cloudflare.com/application-services/products/tur...) and it seems to work, though I imagine most such products would, as again most bots tend to be fairly dumb.
I'd personally hesitate to do something like serving a zip bomb since it would probably cost the bot farm(s) less than it would cost me, and just banning the IP I feel would serve me better than trying to play with it, especially if I know it's misbehaving.
Edit: Of course, the author could state that the satisfaction of seeing an IP 'go quiet' for a bit is priceless - no arguing against that
Bad bots don't even read robots.txt.
https://en.wikipedia.org/wiki/Paris_syndrome
And to heck with cloudflare :S We don't need 3 companies controlling every part of the internet.
I'm not a lawyer, but I'm yet to see a real life court case of a bot owner suing a company or an individual for responding to his malicious request with a zip bomb. The usual spiel goes like this: responding to his malicious request with a malicious response makes you a cybercriminal and allows him (the real cybercriminal) to sue you. Again, except of cheap talk I've never heard of a single court case like this. But I can easily imagine them trying to blackmail someone with such cheap threats.
I cannot imagine a big company like Microsoft or Apple using zip bombs, but I fail to see why zip bombs would be considered bad in any way. Anyone with an experience of dealing with malicious bots knows the frustration and the amount of time and money they steal from businesses or individuals.
This is what trips me up:
>On my server, I've added a middleware that checks if the current request is malicious or not.
There's a lot of trust placed in:
>if (ipIsBlackListed() || isMalicious()) {
Can someone assigned a previously blacklisted IP or someone who uses a tool to archive the website that mimics a bot be served malware? Is the middleware good enough or "good enough so far"?
Close enough to 100% of my internet traffic flows through a VPN. I have been blacklisted by various services upon connecting to a VPN or switching servers on multiple occasions.
A user has to manually unpack a zip bomb, though. They have to open the file and see "uncompressed size: 999999999999999999999999999" and still try to uncompress it, at which point it's their fault when it fills up their drive and fails. So I don't think there's any ethical dilemma there.
They made the request. Respond accordingly.
https://williamgibson.fandom.com/wiki/ICE
Amazon's scraper doesn't back off. Meta, google, most of the others with identifiable user agents back off, Amazon doesn't.
tcpdrop shouldn't self DOS though, it's using less resources. Even if other end does a retry, it will do it after a timeout; in the meantime, the other end has a socket state and you don't, that's a win.
(I don't think your blog qualifies as shady … but you're not in my allowlist, either.)
So if I visit https://anubis.techaro.lol/ (from the "Anubis" link), I get an infinite anime cat girl refresh loop — which honestly isn't the worst thing ever?
But if I go to https://xeiaso.net/blog/2025/anubis/ and click "To test Anubis, click here." … that one loads just fine.
Neither xeserv.us nor techaro.lol are in my allowlist. Curious that one seems to pass. IDK.
The blog post does have that lovely graph … but I suspect I'll loop around the "no cookie" loop in it, so the infinite cat girls are somewhat expected.
I was working on an extension that would store cookies very ephemerally for the more malicious instances of this, but I think its design would work here too. (In-RAM cookie jar, burns them after, say, 30s. Persisted long enough to load the page.)
I used cookie blockers for a long time, but always ended up having to whitelist some sites even though I didn't want their cookies because the site would misbehave without them. Now I just stopped worrying.
Is your browser passing a referrer?
"Hurting people is wrong, so you should not defend yourself when attacked."
"Imprisoning people is wrong, so we should not imprison thieves."
Also the modern telling of Robin Hood seems to be pretty generally celebrated.
Two wrongs may not make a right, but often enough a smaller wrong is the best recourse we have to avert a greater wrong.
The spirit of the proverb is referring to wrongs which are unrelated to one another, especially when using one to excuse another.
This is exactly what Californian educators told kids who were being bullied in the 90's.
The logic of terrorists and war criminals everywhere.
Do you really want to live in a society were all use of punishment to discourage bad behaviour in others? That is a game theoretical disaster...
Crime and Justice are not the same.
If you cannot figure that out, you ARE a major part of the problem.
Keep thinking until you figure it out for good.