Why are anime catgirls blocking my access to the Linux kernel?

94 taviso 112 8/20/2025, 2:54:45 PM lock.cmpxchg8b.com ↗

Comments (112)

Arnavion · 1h ago
>This dance to get access is just a minor annoyance for me, but I question how it proves I’m not a bot. These steps can be trivially and cheaply automated.

>I think the end result is just an internet resource I need is a little harder to access, and we have to waste a small amount of energy.

No need to mimic the actual challenge process. Just change your user agent to not have "Mozilla" in it; Anubis only serves you the challenge if it has that. For myself I just made a sideloaded browser extension to override the UA header for the handful of websites I visit that use Anubis, including those two kernel.org domains.

(Why do I do it? For most of them I don't enable JS so the challenge wouldn't pass anyway. For the ones that I do enable JS for, various self-hosted gitlab instances, I don't consent to my electricity being used for this any more than if it was mining Monero or something.)

zahlman · 27m ago
> Just change your user agent to not have "Mozilla" in it. Anubis only serves you the challenge if you have that.

Won't that break many other things? My understanding was that basically everyone's user-agent string nowadays is packed with a full suite of standard lies.

Arnavion · 24m ago
It doesn't break the two kernel.org domains that the article is about, nor any of the others I use. At least not in a way that I noticed.
Animats · 58m ago
> (Why do I do it? For most of them I don't enable JS so the challenge wouldn't pass anyway. For the ones that I do enable JS for, various self-hosted gitlab instances, I don't consent to my electricity being used for this any more than if it was mining Monero or something.)

Hm. If your site is "sticky", can it mine Monero or something in the background?

We need a browser warning: "This site is using your computer heavily in a background task. Do you want to stop that?"

mikestew · 32m ago
We need a browser warning: "This site is using your computer heavily in a background task. Do you want to stop that?"

Doesn't Safari sort of already do that? "This tab is using significant power", or summat? I know I've seen that message, I just don't have a good repro.

qualeed · 12m ago
Edge does, as well. It drops a warning in the middle of the screen, displays the resource-hogging tab, and asks whether you want to force-close the tab or wait.
throw84a747b4 · 1h ago
Not only is Anubis a poorly thought out solution from an AI sympathizer, it was probably vibecoded.

On top of Tavis's findings, last month it was reported to have exposed the sites using it to a reflected XSS vulnerability:

https://github.com/TecharoHQ/anubis/security/advisories/GHSA...

gruez · 1h ago
>Not only is Anubis a poorly thought out solution from an AI sympathizer [...]

But the project description describes it as a project to stop AI crawlers?

> Weighs the soul of incoming HTTP requests to stop AI crawlers

throw84a747b4 · 58m ago
Why would a company that wants to stop AI crawlers give talks on LLMs and diffusion models at AI conferences?

Why would they use AI art for the first Anubis mascot until GitHub users called out the hypocrisy on the issue tracker?

Why would they use Stable Diffusion art in their blogposts until Mastodon and Bluesky users called them out on it?

Imustaskforhelp · 40m ago
I am not again AI art completely since I think of it as an editing instead of art itself. My thoughts on AI art are nuanced and worth discussing some other day, lets talk about the author of anubis/story of anubis

So, I hope you know the entire story behind Anubis, firstly they were hosting their own git server (I think?) and amazon's ai related department was basically ddosing their server in some sense by trying to scrape it and they created anubis in a way to prevent that.

The idea isn't that new, it is just proof of work and they created it firstly for their own use and I think that they are An AI researcher/ related to AI, so for them using AI pics wasn't that big of a deal and pretty sure that they had some reason behind it and even that has been changed.

Stop whining about free projects/labour man. The same people comment oh well these AI scrapers are scraping so many websites and taking livelihood of website makers and now you have someone who just gave it to ya for free and you are nitpicking the wrong things.

You can just fork it without the anime images or without the AI thing if you don't align with them and their philosophy.

Man now I feel the mandela effect as I read it somewhere on their blog or any thing that they themselves feel the hypocrisy or something along that (pardon me if I am wrong, I usually am) But they themselves (I think?) would like to get rid of working in the AI industry while making anti AI scraper but they might need more donations iirc and they themselves know the hypocrisy.

johnnyanmac · 2m ago
[delayed]
shkkmo · 1m ago
> Stop whining about free projects/labour man. The same people comment oh well these AI scrapers are scraping so many websites and taking livelihood of website makers and now you have someone who just gave it to ya for free and you are nitpicking the wrong things.

That isn't the issue. The issue is that this tool is not fit for purpose and is inappropriate to be used by the projects that have adopted it.

The proof of work scheme is idiotic. As explained in the article, it's super easy to mine enough tokens to bypass for any bad actors, while it interfers and wastes the time of good actors.

It's almost like the author deliberately designed a tool that only looks like it is doing something while actually trivially allowing the very thing it was supposedly built to prevent.

s1mplicissimus · 6m ago
Uh, so I'm not invested either way, but the tone you run in your post suggests to me that you might have fallen for some kind of conspiratorial thinking. Hope ur doing ok.
johnnyanmac · 1m ago
[delayed]
sugarpimpdorsey · 27m ago
Every time I see one of these I think it's a malicious redirect to some pervert-dwelling imageboard.

On that note, is kernel.org really using this for free and not the paid version without the anime? Linux Foundation really that desperate for cash after they gas up all the BMW's?

qualeed · 23m ago
It's crazy (especially considering anime is more popular now than ever; netflix alone is making billions a year on anime) that people see a completely innocent little anime picture and immediately think "pervent-dwelling imageboard".
Seattle3503 · 18m ago
To be fair, that's the sort of place where I spend most of my free time.
ants_everywhere · 1m ago
they've seized the moment to move the anime cat girls off the Arch Linux desktop wallpapers and onto lore.kernel.org.
turtletontine · 6m ago
Even if the images aren’t the kind of sexualized (or downright pornographic) content this implies… having cutesy anime girls pop up when a user loads your site is, at best, wildly unprofessional. (Dare I say “cringe”?) For something as serious and legit as kernel.org to have this, I do think it’s frankly shocking and unacceptable.
qualeed · 59s ago
>I do think it’s frankly shocking and unacceptable.

You're certainly entitled to your opinion. But I think that's a really strong reaction to a completely innocent picture. Companies that post memes must drive you to the edge of sanity.

gruez · 12m ago
"Anime pfp" stereotype is alive and well.
rootsudo · 1h ago
When I instantly read it, I knew it was anubis. I hope the anime catgirls never disapear from that project :)
bakugo · 55m ago
It's more likely that the project itself will disappear into irrelevance as soon as AI scrapers bother implementing the PoW (which is trivial for them, as the post explains) or figure out that they can simply remove "Mozilla" from their user-agent to bypass it entirely.
debugnik · 8m ago
[delayed]
skydhash · 12m ago
It's more about the (intentional?) DDoS from AI scrappers, than preventing them from accessing the content. Bandwidth is not cheap.
dingnuts · 9m ago
PoW increases the cost for the bots which is great. Trivial to implement, sure, but that added cost will add up quickly.

Anyway, then we'll move on to tarpits using traditional methods to cheaply generate real enough looking content that the data becomes worthless.

Fuck AI scrapers, and fuck all this copyright infringement at scale. If it was illegal for Aaron Schwarz it's definitely illegal for Sam Altman.

Frankly, most of these scrapers are in violation of the CFAA as well, a federal crime.

naikrovek · 36s ago
The catgirl thing annoys me. Maybe I’m too old. I’ve had enough anime exposure to last me for 1000 years.

Yes, yes. You’re a very special anime person and your vtube identity is very popular, just like everyone else’s.

I guess I hate vanity. “Everyone look at meeeeeee” stuff.

Everything is fighting for my attention. Every company, lots of individual people, all the time, everywhere. I figured one place that I would see this later rather than sooner was kernel.org.

Borg3 · 5m ago
Oh, its time to bring Internet back to humans. Maybe its time to treat first layer of Internet just as transport. Then, layer large VPN networks and put services there. People will just VPN to vISP to reach content. Different networks, different interests :) But this time dont fuck up abuse handling. Someone is doing something fishy? Depeer him from network (or his un-cooperating upstream!).
ksymph · 5h ago
This is neither here nor there but the character isn't a cat. It's in the name, Anubis, who is an Egyptian deity typically depicted as a jackal or generic canine, and the gatekeeper of the afterlife who weighs the souls of the dead (hence the tagline). So more of a dog-girl, or jackal-girl if you want to be technical.
listic · 8m ago
So... Is Anubis actually blocking bots because they didn't bother to circumvent it?
xena · 4h ago
This same author also ignored the security policy and dropped an Anubis double-spend attack on the issue tracker. Their email got eaten by my spam filter so I didn't realize that I got emailed at all.

Fun times.

tptacek · 44m ago
You needed to have a security contact on your website, or at least in the repo. You did not. You assumed security researchers would instead back out to your Github account's repository list, find the .github repository, and look for a security policy there. That's not a thing!

I'm really surprised you wrote this.

qualeed · 26m ago
>I'm really surprised you wrote this.

I agree with the rest of your comment, but this seems like a weird little jab to add on for no particular reason. Am I misinterpreting?

tptacek · 23m ago
No, there's some background context I'm not sharing, but it's not interesting. I didn't mean to be cryptic, but, obviously, I managed to be cryptic. I promise you're not missing anything.
withinrafael · 1h ago
The security policy that didn't exist until a few hours ago?
david_allison · 1h ago
withinrafael · 1h ago
Adding a security policy to an unrelated repository is easily missed and questionably applicable.
Borgz · 1h ago
In a different repository, though. I think it's understandable that someone would miss it.
valiant55 · 1h ago
I really don't understand the hostility towards the mascot. I can't think of a bigger red flag.
Borgz · 1h ago
Funny to say this when the article literally says "nothing wrong with mascots!"

Out of curiosity, what did you read as hostility?

valiant55 · 40m ago
Oh I totally reacted to the title. The last few times Anubis has been the topic there's always comments about "cringy" mascot and putting that front and center in the title just made me believe that anime catgirls was meant as an insult.
Imustaskforhelp · 36m ago
Honestly I am okay with anime catgirls since I just find it funny but still it would be cool to see linux related stuff. Imagine mr tux penguin gif of him racing in like supertuxcart for the linux website.

sourcehut also uses anubis but they have removed the anime catgirl thing with their own logo, I think disroot also does that I am not sure though

Arnavion · 14m ago
Sourcehut uses go-away, not Anubis.
Imustaskforhelp · 8m ago
https://sourcehut.org/blog/2025-04-15-you-cannot-have-our-us...

> As you may have noticed, SourceHut has deployed Anubis to parts of our services to protect ourselves from aggressive LLM crawlers.

Its nice that sourcehut themselves have talked about it on their own blog but I had discovered this through the anubis website themselves showcases or soemthing like that iirc.

hansjorg · 33m ago
If you want a tip my friend, just block all of Huawei Cloud by ASN.
bogwog · 1h ago
I wonder if the best solution is still just to create link mazes with garbage text like this: https://blog.cloudflare.com/ai-labyrinth/

It won't stop the crawlers immediately, but it might lead to an overhyped and underwhelming LLM release from a big name company, and force them to reassess their crawling strategy going forward?

leumon · 47m ago
Seems like ai bots are indeed bypassing the challenge by computing it: https://social.anoxinon.de/@Codeberg/115033790447125787
debugnik · 1m ago
[delayed]
jimmaswell · 5h ago
What exactly is so bad about AI crawlers compared to Google or Bing? Is there more volume or is it just "I don't like AI"?
dilDDoS · 1h ago
As others have said, it's definitely volume, but also the lack of respecting robots.txt. Most AI crawlers that I've seen bombarding our sites just relentlessly scrape anything and everything, without even checking to see if anything has changed since the last time they crawled the site.
benou · 1h ago
Yep, AI scrapers have been breaking our open-source project gerrit instance hosted at Linux Network Foundation.

Why this is the case while web-crawlers have been scrapping the web for the last 30 years is a mystery to me. This should be a solved problem. But it looks like this field is full of wrongly behaving companies with complete disregards toward common goods.

blibble · 1h ago
they seem to be written by either idiots and/or people that don't give a shit about being good internet citizens

either way the result is the same: they induce massive load

well written crawlers will:

  - not hit a specific ip/host more frequently than say 1 req/5s
  - put newly discovered URLs at the end of a distributed queue (NOT do DFS per domain)
  - limit crawling depth based on crawled page quality and/or response time
  - respect robots.txt
  - make it easy to block them
Philpax · 5h ago
Volume, primarily - the scrapers are running full-tilt, which many dynamic websites aren't designed to handle: https://pod.geraspora.de/posts/17342163
zahlman · 25m ago
Why not just actually rate-limit everyone, instead of slowing them down with proof-of-work?
NobodyNada · 9m ago
My understanding is that AI scrapers rotate IPs to bypass rate-limiting. Anubis requires clients to solve a proof-of-work challenge upon their first visit to the site to obtain a token that is tied to their IP and is valid for some number of requests -- thus rate-limiting impolite scrapers by forcing them to solve a new PoW challenge each time they rotate IPs, while being unobtrusive for regular users and scrapers that don't try to bypass rate limits.
immibis · 3h ago
Why haven't they been sued and jailed for DDoS, which is a felony?
ranger_danger · 2h ago
Criminal convictions in the US require a standard of proof that is "beyond a reasonable doubt" and I suspect cases like this would not pass the required mens rea test, as, in their minds at least (and probably a judge's), there was no ill intent to cause a denial of service... and trying to argue otherwise based on any technical reasoning (e.g. "most servers cannot handle this load and they somehow knew it") is IMO unlikely to sway the court... especially considering web scraping has already been ruled legal, and that a ToS clause against that cannot be legally enforced.
slowmovintarget · 2h ago
I thought only capital crimes (murder, for example) held the standard of beyond a reasonable doubt. Lesser crimes require the standard of either a "Preponderance of Evidence" or "Clear and Convincing Evidence" as burden of proof.

Still, even by those lesser standards, it's hard to build a case.

eurleif · 1h ago
No, all criminal convictions require proof beyond a reasonable doubt: https://constitution.congress.gov/browse/essay/amdt14-S1-5-5...

>Absent a guilty plea, the Due Process Clause requires proof beyond a reasonable doubt before a person may be convicted of a crime.

Majromax · 1h ago
It's civil cases that have the lower standard of proof. Civil cases arise when one party sues another, typically seeking money, and they are claims in equity, where the defendant is alleged to have harmed the plaintiff in some way.

Criminal cases require proof beyond a reasonable doubt. Most things that can result in jail time are criminal cases. Criminal cases are almost always brought by the government, and criminal acts are considered harm to society rather than to (strictly) an individual. In the US, criminal cases are classified as "misdemeanors" or "felonies," but that language is not universal in other jurisdictions.

jmclnx · 1h ago
>The CAPTCHA forces vistors to solve a problem designed to be very difficult for computers but trivial for humans

Not for me, I have nothing but a hard time solving CAPTCHAs, ahout 50% of the time I give up after 2 tries.

serf · 1h ago
it's still certainly trivial for you compared to mentally computing a SHA256 op.
fluoridation · 5h ago
Hmm... What if instead of using plain SHA-256 it was a dynamically tweaked hash function that forced the client to run it in JS?
jsnell · 3h ago
No, the economics will never work out for a Proof of Work-based counter-abuse challenge. CPU is just too cheap in comparison to the cost of human latency. An hour of a server CPU costs $0.01. How much is an hour of your time worth?

That's all the asymmetry you need to make it unviable. Even if the attacker is no better at solving the challenge than your browser is, there's no way to tune the monetary cost to be even in the ballpark to the cost imposed to the legitimate users. So there's no point in theorizing about an attacker solving the challenges cheaper than a real user's computer, and thus no point in trying to design a different proof of work that's more resistant to whatever trick the attackers are using to solve it for cheap. Because there's no trick.

pavon · 47m ago
But for a scraper to be effective it has to load orders of magnitude more pages than a human browses, so a fixed delay cause a human to take 1.1x as long, but it will slow down scraper by 100x. Requiring 100x more hardware to do the same job is absolutely a significant economic impediment.
fluoridation · 2h ago
>An hour of a server CPU costs $0.01. How much is an hour of your time worth?

That's irrelevant. A human is not going to be solving the challenge by hand, nor is the computer of a legitimate user going to be solving the challenge continuously for one hour. The real question is, does the challenge slow down clients enough that the server does not expend outsized resources serving requests of only a few users?

>Even if the attacker is no better at solving the challenge than your browser is, there's no way to tune the monetary cost to be even in the ballpark to the cost imposed to the legitimate users.

No, I disagree. If the challenge takes, say, 250 ms on the absolute best hardware, and serving a request takes 25 ms, a normal user won't even see a difference, while a scraper will see a tenfold slowdown while scraping that website.

michaelt · 1h ago
The problem with proof-of-work is many legitimate users are on battery-powered, 5-year-old smartphones. While the scraping servers are huge, 96-core, quadruple-power-supply beasts.
jsnell · 2h ago
The human needs to wait for their computer to solve the challenge.

You are trading something dirt-cheap (CPU time) for something incredibly expensive (human latency).

Case in point:

> If the challenge takes, say, 250 ms on the absolute best hardware, and serving a request takes 25 ms, a normal user won't even see a difference, while a scraper will see a tenfold slowdown while scraping that website.

No. A human sees a 10x slowdown. A human on a low end phone sees a 50x slowdown.

And the scraper paid one 1/1000000th of a dollar. (The scraper does not care about latency.)

That is not an effective deterrent. And there is no difficulty factor for the challenge that will work. Either you are adding too much latency to real users, or passing the challenge is too cheap to deter scrapers.

fluoridation · 1h ago
>No. A human sees a 10x slowdown.

For the actual request, yes. For the complete experience of using the website not so much, since a human will take at least several seconds to process the information returned.

>And the scraper paid one 1/1000000th of a dollar. (The scraper does not care about latency.)

The point need not be to punish the client, but to throttle it. The scraper may not care about taking longer, but the website's operator may very well care about not being hammered by requests.

avhon1 · 34m ago
But now I have to wait several seconds before I can even start to process the webpage! It's like the internet suddenly became slow again overnight.
jsnell · 1h ago
A proof of work challenge does not throttle the scrapers at steady state. All it does is add latency and cost to the first request.
fluoridation · 1h ago
Hypothetically, the cookie could be used to track the client and increase the difficulty if its usage becomes abusive.
VMG · 5h ago
crawlers can run JS, and also invest into running the Proof-Of-JS better than you can
tjhorner · 4h ago
Anubis doesn't target crawlers which run JS (or those which use a headless browser, etc.) It's meant to block the low-effort crawlers that tend to make up large swaths of spam traffic. One can argue about the efficacy of this approach, but those higher-effort crawlers are out of scope for the project.
Imustaskforhelp · 31m ago
reminds of how wikipedia literally has all the data available even in a nice format just for scrapers (I think) and even THEN, there are some scrapers which still scraped wikipedia and actually made wikipedia lose some money so much that I am pretty sure that some official statement had to be made or they disclosed about it without official statement.

Even then, man I feel like you yourself can save on so many resources (both yours) and (wikipedia) if scrapers had the sense to not scrape wikipedia and instead follow wikipedia's rules

fluoridation · 5h ago
If we're presupposing an adversary with infinite money then there's no solution. One may as well just take the site offline. The point is to spend effort in such a way that the adversary has to spend much more effort, hopefully so much it's impractical.
johnea · 1h ago
My biggest bitch is that it requires JS and cookies...

Although the long term problem is the business model of servers paying for all network bandwidth.

Actual human users have consumed a minority of total net bandwidth for decades:

https://www.atom.com/blog/internet-statistics/

Part 4 shows bots out using humans in 1996 8-/

What are "bots"? This needs to include goggleadservices, PIA sharing for profit, real-time ad auctions, and other "non-user" traffic.

The difference between that and the LLM training data scraping, is that the previous non-human traffic was assumed, by site servers, to increase their human traffic, through search engine ranking, and thus their revenue. However the current training data scraping is likely to have the opposite effect: capturing traffic with LLM summaries, instead of redirecting it to original source sites.

This is the first major disruption to the internet's model of finance since ad revenue look over after the dot bomb.

So far, it's in the same category as the environmental disaster in progress, ownership is refusing to acknowledge the problem, and insisting on business as usual.

Rational predictions are that it's not going to end well...

jerf · 36m ago
"Although the long term problem is the business model of servers paying for all network bandwidth."

Servers do not "pay for all the network bandwidth" as if they are somehow being targeted for fees and carrying water for the clients that are somehow getting it for "free". Everyone pays for the bandwidth they use, clients, servers, and all the networks in between, one way or another. Nobody out there gets free bandwidth at scale. The AI scrapers are paying lots of money to scrape the internet at the scales they do.

Imustaskforhelp · 25m ago
The Ai scrapers are most likely vc funded and all they care about is getting as much data as possible and not worry about the costs.

They are hiring machines at scale too so definitely bandwidth etc. are cheaper for them too. Maybe use a provider that doesn't have too much bandwidth issues (hetzner?)

But still, the point being that you might be hosting website on your small server and that scraper with its machines beast can come and effectively ddos your server looking for data to scrape. Deterring them is what matters so that the economical scale finally slide back to our favours again.

zb3 · 18m ago
Anubis doesn't use enough resources to deter AI bots. If you really want to go this way, use React, preferably with more than one UI framework.
iefbr14 · 5h ago
I wouldn't be surprised if just delaying the server response by some 3 seconds will have the same effect on those scrapers as Anubis claims.
kingstnap · 1h ago
There is literally no point wasting 3 seconds of a computer's time and it's expensive wasting 3 seconds of a person's time.

That is literally an anti-human filter.

Imustaskforhelp · 28m ago
From tjhorner on this same thread

"Anubis doesn't target crawlers which run JS (or those which use a headless browser, etc.) It's meant to block the low-effort crawlers that tend to make up large swaths of spam traffic. One can argue about the efficacy of this approach, but those higher-effort crawlers are out of scope for the project."

So its meant/preferred to block low effort crawlers which can still cause damage if you don't deal with them. a 3 second deterrent seems good in that regard. Maybe the 3 second deterrent can come as in rate limiting an ip? but they might use swath's of ip :/

ranger_danger · 2h ago
Yea I'm not convinced unless somehow the vast majority of scrapers aren't already using headless browsers (which I assume they are). I feel like all this does is warm the planet.
Philpax · 5h ago
The argument isn't that it's difficult for them to circumvent - it's not - but that it adds enough friction to force them to rethink how they're scraping at scale and/or self-throttle.

I personally don't care about the act of scraping itself, but the volume of scraping traffic has forced administrators' hands here. I suspect we'd be seeing far fewer deployments if the scrapers behaved themselves to begin with.

davidclark · 5h ago
The OP author shows that the cost to scrape an Anubis site is essentially zero since it is a fairly simple PoW algorithm that the scraper can easily solve. It adds basically no compute time or cost for a crawler run out of a data center. How does that force rethinking?
Philpax · 5h ago
The cookie will be invalidated if shared between IPs, and it's my understanding that most Anubis deployments are paired with per-IP rate limits, which should reduce the amount of overall volume by limiting how many independent requests can be made at any given time.

That being said, I agree with you that there are ways around this for a dedicated adversary, and that it's unlikely to be a long-term solution as-is. My hope is that the act of having to circumvent Anubis at scale will prompt some introspection (do you really need to be rescraping every website constantly?), but that's hopeful thinking.

yborg · 1h ago
>do you really need to be rescraping every website constantly Yes, because if you believe you out-resource your competition, by doing this you deny them training material.
hooverd · 5h ago
The problem with crawlers if that they're functionally indistinguishable from your average malware botnet in behavior. If you saw a bunch of traffic from residential IPs using the same token that's a big tell.
yuumei · 5h ago
> The CAPTCHA forces vistors to solve a problem designed to be very difficult for computers but trivial for humans. > Anubis – confusingly – inverts this idea.

Not really, AI easily automates traditional captchas now. At least this one does not need extensions to bypass.

No comments yet

immibis · 3h ago
The actual answer to how this blocks AI crawlers is that they just don't bother to solve the challenge. Once they do bother solving the challenge, the challenge will presumably be changed to a different one.
lxgr · 5h ago
> This isn’t perfect of course, we can debate the accessibility tradeoffs and weaknesses, but conceptually the idea makes some sense.

It was arguably never a great idea to begin with, and stopped making sense entirely with the advent of generative AI.

anotherhue · 5h ago
Surely the difficulty factor scales with the system load?
ksymph · 5h ago
Reading the original release post for Anubis [0], it seems like it operates mainly on the assumption that AI scrapers have limited support for JS, particularly modern features. At its core it's security through obscurity; I suspect that as usage of Anubis grows, more scrapers will deliberately implement the features needed to bypass it.

That doesn't necessarily mean it's useless, but it also isn't really meant to block scrapers in the way TFA expects it to.

[0] https://xeiaso.net/blog/2025/anubis/

jhanschoo · 5h ago
Your link explicitly says:

> It's a reverse proxy that requires browsers and bots to solve a proof-of-work challenge before they can access your site, just like Hashcash.

It's meant to rate-limit accesses by requiring client-side compute light enough for legitimate human users and responsible crawlers in order to access but taxing enough to cost indiscriminate crawlers that request host resources excessively.

It indeed mentions that lighter crawlers do not implement the right functionality in order to execute the JS, but that's not the main reason why it is thought to be sensible. It's a challenge saying that you need to want the content bad enough to spend the amount of compute an individual typically has on hand in order to get me to do the work to serve you.

ksymph · 4h ago
Here's a more relevant quote from the link:

> Anubis is a man-in-the-middle HTTP proxy that requires clients to either solve or have solved a proof-of-work challenge before they can access the site. This is a very simple way to block the most common AI scrapers because they are not able to execute JavaScript to solve the challenge. The scrapers that can execute JavaScript usually don't support the modern JavaScript features that Anubis requires. In case a scraper is dedicated enough to solve the challenge, Anubis lets them through because at that point they are functionally a browser.

As the article notes, the work required is negligible, and as the linked post notes, that's by design. Wasting scraper compute is part of the picture to be sure, but not really its primary utility.

ranger_danger · 2h ago
The compute also only seems to happen once, not for every page load, so I'm not sure how this is a huge barrier.
superkuh · 1h ago
Kernel.org* just has to actually configure Anubis rather than deploying the default broken config. Enable the meta-refresh proof of work rather than relying on the corporate browsers only bleeding edge javascript application proof of work.

* or whatever site the author is talking about, his site is currently inaccessible due to the amount of people trying to load it.

WesolyKubeczek · 5h ago
I disagree with the post author in their premise that things like Anubis are easy to bypass if you craft your bot well enough and throw the compute at it.

Thing is, the actual lived experience of webmasters tells that the bots that scrape the internets for LLMs are nothing like crafted software. They are more like your neighborhood shit-for-brain meth junkies competing with one another who makes more robberies in a day, no matter the profit.

Those bots are extremely stupid. They are worse than script kiddies’ exploit searching software. They keep banging the pages without regard to how often, if ever, they change. If they were 1/10th like many scraping companies’ software, they wouldn’t be a problem in the first place.

Since these bots are so dumb, anything that is going to slow them down or stop them in their tracks is a good thing. Short of drone strikes on data centers or accidents involving owners of those companies that provide networks of botware and residential proxies for LLM companies, it seems fairly effective, doesn’t it?

busterarm · 59m ago
Those are just the ones that you've managed to ID as bots.

Ask me how I know.

lousken · 5h ago
aren't you happy? at least you see catgirl
rnhmjoj · 5h ago
I don't understand, why do people resort to this tool instead of simply blocking by UA string or IP address. Are there so many people running these AI crawlers?

I blackholed some IP blocks of OpenAI, Mistral and another handful of companies and 100% of this crap traffic to my webserver disappeared.

mnmalst · 5h ago
Because that solution simply does not work for all. People tried and the crawlers started using proxies with residential IPs.
busterarm · 32m ago
Lots of companies run these kind of crawlers now as part of their products.

They buy proxies and rotate through proxy lists constantly. It's all residential IPs, so blocking IPs actually hurts end users. Often it's the real IPs of VPN service customers, etc.

There are lots of companies around that you can buy this type of proxy service from.

hooverd · 5h ago
less savory crawlers use residential proxies and are indistinguishable from malware traffic
WesolyKubeczek · 5h ago
You should read more. AI companies use residential proxies and mask their user agents with legitimate browser ones, so good luck blocking that.
rnhmjoj · 5h ago
Which companies are we talking about here? In my case the traffic was similar to what was reported here[1]: these are crawlers from Google, OpenAI, Amazon, etc. they are really idiotic in behaviour, but at least report themselves correctly.

[1]: https://pod.geraspora.de/posts/17342163

nemothekid · 5m ago
OpenAI/Anthropic/Perplexity aren't the bad actors here. If they are, they are relatively simply to block - why would you implement an Anubis PoW MITM Proxy, when you could just simply block on UA?

I get the sense many of the bad actors are simply poor copycats that are poorly building LLMs and are scraping the entire web without a care in the world

majorchord · 2h ago
> AI companies use residential proxies

Source:

Macha · 1h ago
Source: Cloudflare

https://blog.cloudflare.com/perplexity-is-using-stealth-unde...

Perplexity's defense is that they're not doing it for training/KB building crawls but for answering dynamic queries calls and this is apparently better.

ranger_danger · 1h ago
I do not see the words "residential" or "proxy" anywhere in that article... or any other text that might imply they are using those things. And personally... I don't trust crimeflare at all. I think they and their MITM-as-a-service has done even more/lasting damage to the global Internet and user privacy in general than all AI/LLMs combined.

However, if this information is accurate... perhaps site owners should allow AI/bot user agents but respond with different content (or maybe a 404?) instead, to try to prevent it from making multiple requests with different UAs.

jayrwren · 5h ago
literally the top link when I search for his exact text "why are anime catgirls blocking my access to the Linux kernel?" https://lock.cmpxchg8b.com/anubis.html Maybe travis needs more google-fu. maybe that includes using duckduckgo?
Macha · 1h ago
The top link when you search the title of the article is the article itself?

I am shocked, shocked I say.

PaulHoule · 6h ago
I think a lot of it is performative and a demonstration that somebody is a member of a tribe, particularly the part about the kemonomimi [1] (e.g. people who are kinda like furries but have better test in art)

[1] https://safebooru.donmai.us/posts?tags=animal_ears

dathinab · 5h ago
you are overthinking

it's a simple as having a nice picture there make this whole thing feel nicer, and give it a bit of personality

so you put in some picture/art you like

that's it

similar any site sing it can change that picture, but there isn't any fundamental problem with the picture, so most can't care to change it