Sadly it hard to tell if this is an actual DDoS attack, or scrappers descending on the site. It all looks very similar.
The search engines always seemed happy to announce that they are in fact GoogleBot/BingBot/Yahoo/whatever and frequently provided you with their expected IP ranges. The modern companies, mostly AI companies, seems to be more interested in flying under the radar, and have less respect for the internet infrastructure at a whole. So we're now at a point where I can't tell if it's an ill willed DDoS attack or just shitty AI startup number 7 reloading training data.
jeroenhd · 12h ago
> The modern companies, mostly AI companies, seems to be more interested in flying under the radar, and have less respect for the internet infrastructure at a whole
I think that makes a lot of sense. Google's goal is (or perhaps used to be) providing a network of links. The more they scrape you, the more visitors you may end up receiving, and the better your website performs (monetarily, or just in terms of providing information to the world).
With AI companies, the goal is to consume and replace. In their best case scenario, your website will never receive a visitor again. You won't get anything in return for providing content to AI companies. That means there's no reason for website administrators to permit the good ones, especially for people who use subscriptions or ads to support their website operating costs.
eadmund · 9h ago
> With AI companies, the goal is to consume and replace.
I don’t think that’s really true. The AI companies’ goal is to consume and create something else.
> You won't get anything in return for providing content to AI companies.
That was the original problem with websites in general, and the ‘solution’ was ads. It would be really, really cool if the thing which finally makes micropayments happen is AI.
And then we humans could use micropayments too. Of course, the worst of both worlds would be micropayments and ads.
mrweasel · 8h ago
You can have non-commercial websites. Plenty of people have blogs or personal websites, sites that support a business, sites where you already pay and in this case it was the ScummVM website, an open source project.
A lot of those sites are at risk of being made irrelevant by AI companies who really don't give a shit about your motivations for doing something for free. If their crawler kills your site and their LLM steals views by regurgitation answers based on your work, so be it, you served your purpose.
If you want to talk payment: Ask the AI companies to pay you when they generate an answer based on your work, a license fee. That will kill their business model pretty quickly.
no_wizard · 5h ago
The unfortunate truth is this indeed needs to be legislated so the penalties are severe and it’s easy for users to setup the measures and enforce against violations without fear.
Fair use is being abused big time by AI companies and search engines before that even
eadmund · 7h ago
> … their crawler kills your site and their LLM steals views by regurgitation answers based on your work
How is that different from a human being reading my underwater basket weaving site and starting his own, ‘stealing’ ‘my’ views? Or a thousand human beings out of the billions on Earth doing the same thing?
BobaFloutist · 2h ago
The same way it's different of someone throws a bullet at you from their hand from 10 feat away versus propelling tens of them a second from a fully automatic rifle from 50 feet away.
Sure, in either situation you could say "They trying to harm me using bullets," but one of them is much more likely to succeed, and we probably shouldn't treat the situations or costs to your well being as legally identical.
mrweasel · 6h ago
That person might actually contribute some of their own knowledge and experience. Also you probably put the information out there because you want to spread it, but once it's hidden behind an LLM chat prompt the community dies.
You're correct that there's not really anything stopping a person from ripping you of, tweaking your work just enough that it's not a copy right violation. Unless that person themselves have a really good grasp of the topic and can contribute it will become clear that they are getting the content else where and the readers will end up there in the end. Many, not all obviously, will also provide attribution, something LLMs rarely do.
Then you have the issue that the person publishing something on their own little server now has to deal with commercial companies just hammering their sites into the ground and they have to deal with that problem, just so someone can do an automated version of content theft?
A lot of things people could potentially do are minor issues, until it's automated and commercialized.
no_wizard · 5h ago
The automation aspect is enough to differentiate from a human and understand how they makes its impact 1000x worse
poincaredisk · 6h ago
Honestly?
I have a personal blog. It's free. I write because I want humans to read my work, not because I want to provide a free labor to AI companies.
This argument doesn't work here.
philipwhiuk · 10h ago
It's DDoS either way even if it's not an attack.
piokoch · 12h ago
Yes, search engines were not hiding, as it was website owner interest involved here as well - without those search bots their sites would not be indexed and searchable in the Internet. So there was kind of win-win situation, in most typical cases at least, as for instance publishers complained about deep links, etc. because their ads revenue was hurt.
AI scrapping bots provide zero value for sites owners.
Valodim · 49m ago
Is this really true? If I have a marketing website for a product, isn't it in my interest to have that marketing incorporated in AI models?
CaptainFever · 13h ago
> To me, Anubis is not only a blocker for AI scrapers. Anubis is a DDoS protection.
Anubis is DDoS protection, just with updated marketing. These tools have existed forever, such as CloudFlare Challenges, or https://github.com/RuiSiang/PoW-Shield. Or HashCash.
I keep saying that Anubis really has nothing much to do with AI (e.g. some people might mistakenly think that it magically "blocks AI scrapers"; it only slows down abusive-rate visitors). It really only deals with DoS and DDoS.
I don't understand why people are using Anubis instead of all the other tools that already exist. Is it just marketing? Saying the right thing at the right time?
Imustaskforhelp · 12h ago
I agree with you that it is infact a DDOS protection but still, the fact that it is open source and created by a really cool dev (she is awesome),
I think I don't really mind it gaining popularity.
And also they had created it out of their own necessity which is also really nice.
Anubis is getting real love out there and I think I am all for it. I personally host a lot of my stuff on cloudflare due to it being free with cloudflare workers but if I ever have a vps, I am probably going to use anubis as well
alias_neo · 12h ago
I'm not sure why there's so many negative comments here. This looks nice, appears to work, is open source and MIT licensed. Why _wouldn't_ I use this?
fmajid · 12h ago
It also doesn’t cede more market power to CloudFlare, which tends to block non-mainstream browsers, users with adblockers, Tor, or cookies and JavaScript disabled.
amarcheschi · 11h ago
I don't know what have I done but I'd say I get blocked by cloudflare a few visits per week. It's not a huge deal but it's very annoying
GoblinSlayer · 10h ago
It's usually explained as site owner sets stringent security settings.
CaptainFever · 11h ago
This tool does block JavaScript-disabled browsers though. There's a comment here that complained about the pain Anubis causes with cookie-less browsers, but they got downvoted.
gkbrk · 11h ago
There's also "checkpoint" [1] which works without Javascript. As far as I can tell they cover the same use case with a very similar user experience.
I have plans involving IP reputation and a few other behaviors I've noticed. The main problem is that all my ideas involve cookies.
JodieBenitez · 12h ago
> I don't understand why people are using Anubis instead of all the other tools that already exist. Is it just marketing? Saying the right thing at the right time?
Care to share existing solutions that can be self-hosted ? (genuine question, I like how Anubis works, I just want something with a more neutral look and feel).
dspillett · 3h ago
> I just want something with a more neutral look and feel.
If it is perfect for your needs other than the look, you could update the superficial parts to match your liking?
If it is designed in such a way as to make this difficult, such as if the visible content & styling is tangled within the code rather than all in static assets (I've not looked at the code myself yet), then perhaps raise an issue suggesting that this is changed (or if you are a coder yourself, perhaps do so and raise a pull request for your changes).
Given how popular the tool seems to be coming, I expect theming this sort of theming will be an official feature eventually anyway, of you are patient.
Of course the technique it uses is well know and documented, so there may already be other good implementations that match your visual needs without any of the above effort.
Knowing something exists is half the challenge. Never used it but ,maybe ease of use/setup or license?
areyourllySorry · 11h ago
pow shield does not offer a furry loading screen so it can't be as good
superkuh · 5h ago
All the other tools don't actually work. What I mean is that they block far, far, more than they intend to. Anubis actually works on every weird and niche browser I've tried. Which is to say, it lets actual human people through even if they aren't using Chrome.
CloudFlare doesn't do that. Cloudflare's false positive rate is extremely high, as are the others. Mostly because they all depend on bleeding edge JS and browser functions (CORS, etc) for fingerprinting functionality.
Cloudflare is for for-profit and other situations where you don't care if you block poor people because they can't give you money anyway. Anubis is for if you want everyone to be able to access your website.
prmoustache · 3h ago
I doubt it works with dillo or lynx.
touggourt · 3h ago
if it doens't work yet, you can suggest a patch
GoblinSlayer · 11h ago
The readme explains that it's for the case when you don't use cloudflare, also it's open source, analogous to PoW Shield, but has less heavy dependencies.
GoblinSlayer · 6h ago
Though PoW Shield uses simple symmetric signature, while anubis uses ed25519/jwt.
immibis · 13h ago
marketing plus a product that Just Does The Thing, it seems like. No bullshit.
btw it only works on AI scrapers because they're DDoSes.
CaptainFever · 11h ago
Not all DDoSes are AI-related, and not all AI scrapers are DDoSes.
superkuh · 5h ago
But almost all DoS's we're talking about are from corporations. The real non-human danger.
chrisnight · 20h ago
> Solving the challenge–which is valid for one week once passed–
One thing that I've noticed recently with the Arch Wiki adding Anubis, is that this one week period doesn't magically fix user annoyances with Anubis. I use Temporary Containers for every tab, which means that I constantly get Anubis regenerating tokens, since the cookie gets deleted as soon as the tab is closed.
Perhaps this is my own problem, but given the state of tracking on the internet, I do not feel it is an extremely out-of-the-ordinary circumstance to avoid saving cookies.
philipwhiuk · 10h ago
I think it's absolutely your problem. You're ignoring all the cache lifetimes on assets.
selfhoster11 · 5h ago
OK, so what? Keeping persistent state on your machine shouldn't be mandatory for a comfortable everyday internet browsing experience.
orthecreedence · 1h ago
What then do you suggest as a good middle ground between website publishers and website enjoyers? Doing a one-time challenge and storing the result seems like a really good compromise between all parties. But that's not good enough! So what is?
jsheard · 20h ago
It could be worse, the main alternative is something like Cloudflares death-by-a-thousand-CAPTCHAs when your browser settings or IP address put you on the wrong side of their bot detection heuristics. Anubis at least doesn't require any interaction to pass.
Unfortunately nobody has a good answer for how to deal with abusive users without catching well behaved but deliberately anonymous users in the crossfire, so it's just about finding the least bad solution for them.
lousken · 20h ago
I hated everyone who enabled the cloudflare validation thing on their website, because it was blocked for months (I got stuck on that captcha that was refusing my Firefox). Eventually they fixed it but it was really annoying.
goku12 · 16h ago
The CF verification page still appears far too often in some geographic regions. It's such an irritant that I just close the tab and leave when I see it. It's so bad that seeing the Anubis page instead is actually a big relief! I consider the CF verification and its enablers as a shameless attack the open web - a solution nearly as bad as the problem it tries to solve.
_bin_ · 13h ago
Forget esoteric areas, I'm an average American guy who gets them running from a residential IP or cell IP. It even happens semi-frequently on my iPhone which is insane. I guess I must have "bot-like" behavior in my browsing, even from a cell.
WesolyKubeczek · 10h ago
I noticed that Google happily puts you on its shitlist as soon as you use any advanced parameters on your searches, such as “filetype:” or “inurl:” or “site:”.
_bin_ · 3h ago
This probably has something to do with it. I probably tend to move faster than average and am "bot-like" in that I sort of "scrape": search for something and quickly open all relevant tabs to review, page through them, search again. If while I'm going through I have something else I'd like to find, I'll fire up yet another tab and pop open all relevant tabs from that. Etc.
throwaway562if1 · 14h ago
I am still unable to pass CF validation on my desktop (sent to infinite captcha loop hell). Nowadays I just don't bother with any website that uses it.
imcritic · 10h ago
Too many sites that used to be good installed that shit. And weird part is that on desktop only Chromium fails to pass the captcha, no issues on Firefox. But Chromium is my main browser and sometimes I'm too lazy/uncomfortable opening 2nd browser for those sites.
qiu3344 · 9h ago
I'd even argue that Anubis is universally superior in this domain.
A sufficiently advanced web scraper can build a statistical model of fingerprint payloads that are categorized by CF as legit and change their proxy on demand.
The only person who will end up blocked is the regular user.
There is also a huge market of proprietary anti-bot solvers, not to mention services that charge you per captcha-solution. Usually it's just someone who managed to crack the captcha and is generating the solutions automatically, since the response time is usually a few hundred milliseconds.
This is a problem with every commercial Anti-bot/captcha solution and not just CF, but also AWS WAF, Akamai, etc.
xena · 8h ago
The pro gamer move is to use risk calculation as a means of determining when to throw a challenge, not when to deny access :)
trod1234 · 20h ago
> Unfortunately nobody has a good answer for how to deal with abusive users without catching well behaved but deliberately anonymous users in the crossfire...
Uhh, that's not right.
There is a good answer, but no turnkey solution yet.
The answer is making each request cost a certain amount of something from the person, and increased load by that person comes with increased cost on that person.
halosghost · 20h ago
Note that this is actually one of the things Anubis does. That's what the proof-of-work system is, it just operates across the full load rather than targeted to a specific user's load. But, to the GP's point, that's the best option while allowing anonymous users.
All the best,
-HG
No comments yet
Spivak · 20h ago
I know that you mean a system that transfers money but you are also describing Anubis because PoW is literally to make accessing the site cost more and scale that cost proportional to the load.
trod1234 · 20h ago
> I know that you mean a system that transfer money ....
No, cost is used in the fullest abstract meaning of the word here.
Time cost, effort cost, monetary cost, work cost, so long as there is a functional limitation that prevents resource exhaustion that is the point.
lelandbatey · 19h ago
If cost can be anything, does Anubis implement such a system then, by using proof-of-work as the cost function?
fc417fc802 · 9h ago
Sort of. Anubis is frontloading the cost all at once and then amortizing it over a large number of subsequent requests. That detail is what's causing the issue when browsing with additional privacy measures.
This makes discussions such as this have a negative ROI for an average commenter.
Spamming scam and grift links still has a positive ROI, albeit a slightly smaller one.
I use a certain online forum which sometimes makes users wait 60 or 900 seconds before they can post. It has prevented me from making contributions multiple times.
immibis · 12h ago
I'm using one with a 5 in 14400 seconds timer right now. Ditto.
gruez · 16h ago
>It could be worse, the main alternative is something like Cloudflares death-by-a-thousand-CAPTCHAs when your browser settings or IP address put you on the wrong side of their bot detection heuristics.
Cloudflare's checkbox challenge is probably the better challenge systems. Other security systems are far worse, requiring either something to be solved, or a more annoying action (eg. holding a button for 5 seconds).
Dylan16807 · 14h ago
Checking a box is fine when it lets you through.
The problem is when cloudflare doesn't let you through.
gruez · 6h ago
>The problem is when cloudflare doesn't let you through.
Don't use an unusual browser configuration then, like spoofing user-agents or whatever? If you're doing it for "privacy" reasons, it's likely counterproductive. The fact that cloudflare can detect it means that the spoofing isn't doing a very good job, and therefore you're making yourself more fingerprintable.
Dylan16807 · 5h ago
There's a whole lot of things that can count as "unusual" that aren't spoofing, and telling people not to be super vague "unusual" is a terrible solution.
gruez · 5h ago
>There's a whole lot of things that can count as "unusual" that aren't spoofing
Examples?
Dylan16807 · 3h ago
Ad block, other blocking, third party cookie restrictions, all the stuff firefox changes when you toggle resistFingerprinting. From your other comment "users with no google cookies" and "connecting from VPN".
Punishing people for not having Google cookies is probably the most obnoxious one.
imcritic · 10h ago
Same problem with Google's captchas: solving them doesn't always mean you will be let in. That's outrageous, like isn't that the whole point?
fmbb · 8h ago
No, the whole point is you are helping machine learning training. Doing work for free.
gruez · 6h ago
It really isn't. If they were purely focused on getting training data, they would give more captchas to everyone, not just the users with no google cookies, connecting from VPN, and with weird browser configurations. The fact of the matter is that all those attributes are more "suspicious" than average, and therefore they want to up the cost for getting past the captcha.
notpushkin · 14h ago
Yeah. A “drag this puzzle piece” captcha style is also relatively easy, but things like reCaptcha or hCaptcha are just infuriating.
For pure POW (no fingerprinting), mCaptcha is a nice drop-in replacement you can self-host: https://mcaptcha.org/
GoblinSlayer · 9h ago
Looks like mCaptcha is an login captcha, while cloudflare and anubis intercept any access including DDoS.
ashkulz · 1h ago
I too use Temporary Containers, and my solution is to use a named container and associate that site with the container.
imcritic · 10h ago
For me the biggest issue with archwiki adding Anubis is that it doesn't let me in when I open it on mobile. I am using Cromite: it doesn't support extensions, but has some ABP integrated in.
selfhoster11 · 5h ago
I am low-key shocked that this has become a thing on Arch Wiki, of all places. And that's just to access the main page, not even for any searches. Arch Wiki is the place where you often go when your system is completely broken, sometimes to the extent that some clever proof of work system that relies on JS and whatever will fail. I'm sure they didn't decide this lightly, but come on.
TiredOfLife · 13h ago
It's not a problem. You have configured your system to show up as a new visitor every time you visit a website. And you are getting expected behaviour.
bscphil · 18h ago
It's even worse if you block cookies outright. Every time I hit a new Anubis site I scream in my head because it just spins endlessly and stupidly until you enable cookies, without even a warning. Absolutely terrible user experience; I wouldn't put any version of this in front of a corporate / professional site.
Dylan16807 · 16h ago
Blocking cookies completely is just asking for a worse method of tracking sessions. It's fine for a site to be aware of visits. As someone who argues that sites should work without javascript, blocking all cookies strikes me as doing things wrong.
bscphil · 13h ago
A huge proportion of sites (a) use cookies, (b) don't need cookies. You can easily use extensions to enable cookies for the sites that need them, while leaving others disabled. Obviously some sites are going to do shitty things to track you, but they'd probably be doing that anyway.
The issue I'm talking about is specifically how frustrating it is to hit yet another site that has switched to Anubis recently and having to enable cookies for it.
Dylan16807 · 5h ago
The next best alternative to a basic session cookie isn't doing shitty things, it's either using your IP and praying that doesn't break, or putting the session token into each link.
There's no real way to hide that you're visiting the site and clicking multiple pages during that visit, so I don't see what's so bad about accepting a first party cookie for an hour.
xena · 8h ago
Hi. Developer of Anubis here. How am I meant to store state in the client without cookies if JavaScript is also disabled? Genuinely curious.
GoblinSlayer · 9h ago
You would prefer the cookie embedded in url?
goku12 · 15h ago
I will take Anubis any day over its alternative - the cloudflare verification page. I just close the tab as soon as I see it.
jezek2 · 18h ago
If you want to browse the web without cookies (and without JS in an usable manner) you may try FixProxy[1]. It has a direct support for Anubis in the development version.
Browsers that have cookies and/or JS disabled have been getting broken experiences for well over a decade, it's hard to take this criticism seriously when professional sites are the most likely to break in this situation.
jillyboel · 19h ago
> One thing that I've noticed recently with the Arch Wiki adding Anubis
Is that why it now shows that annoying slow to load prompt before giving me the content I searched for?
esseph · 19h ago
Would you like to propose an alternative solution that meets their needs and on their budget?
goku12 · 15h ago
Anubis has a 'slow' and a 'fast' mode [1], with fast mode selected by default. It used to be so fast that I rarely used to get time to read anything on the page. I don't know why it's slower now - it could be that they're using the slower algorithm, or else the algorithm itself may have become slower. Either way, it shouldn't be too hard to modify it with a different algorithm or make the required work a parameter. This of course has the disadvantage of making it easier for the scrapers to get through.
The DIFFICULTY environment variable already allows for configuring how many iterations the program will run (in powers of 10).
The fast/slow selection still applies, but if you put up the difficulty, even the fast version will take some time.
jillyboel · 19h ago
a static cache for anyone not logged in, and only doing this check when you are authenticated which gives access to editing pages?
edit: Because HN is throwing "you're posting too fast" errors again:
> That falls short of the "meets their needs" test. Authenticated users already have a check (i.e., the auth process). Anubis is to stop/limit bots from reading content.
Arch Wiki is a high value target for scraping so they'll just solve the anubis challenge once a week. It's not going to stop them.
pynappo · 15h ago
> Arch Wiki is a high value target for scraping so they'll just solve the anubis challenge once a week. It's not going to stop them.
The goal of Anubis isn't to stop them from scraping entirely, but rather to slow down aggressive scraping (e.g. sites with lots of pages being scraped every 6 hours[1]) so that the scraping doesn't impact the backend nearly as much
The point of a static cache is that your backend isn't impacted at all.
glenngillen · 17h ago
That falls short of the "meets their needs" test. Authenticated users already have a check (i.e., the auth process). Anubis is to stop/limit bots from reading content.
lelanthran · 12h ago
> Arch Wiki is a high value target for scraping so they'll just solve the anubis challenge once a week.
ISTR that Anubis allows the site-owner to control the expiry on the check; if you're still getting hit by bots, turn the check to 5s with a lower "work" effort so that every request will take (say) 2s, and only last for 5s.
(Still might not help though, because that optimises for bots at the expense of humans - a human will only do maybe one actual request every 30 - 200 seconds, while a bot could do a lot in 5s).
fc417fc802 · 9h ago
Rather than a time to live you probably want a number of requests to live. Decrement a counter associated with the token at every request until it expires.
An obvious followup is to decrement it by a larger amount if requests are made at a higher frequency.
CaptainFever · 11h ago
Does anyone know if static caches work? No one seems to have replied to that point. It seems like a simple and user-friendly solution.
xena · 7h ago
Caches would only work if the bots were hitting routes that any human had ever hit before.
jillyboel · 6h ago
They'd also work if the bot, or another bot, hits that route before. It's a wiki, the amount of content is finite and each route getting hit once isn't a problem.
butz · 11h ago
As usual, there is a negative side to such protection: I was trying to download some raw files from git repository and instead of data got bunch of html. After quick look it turned out to be Anubis HTML page. Another issue was with broken links to issue tickets on main page, where Anubis was asking wrapper script to solve some hashes.
Lesson here: after deploying Anubis, please carefully check the impact. There might be some unexpected issues.
eadmund · 9h ago
> I was trying to download some raw files from git repository and instead of data got bunch of html. After quick look it turned out to be Anubis HTML page.
Yup. Anubis breaks the web. And it requires JavaScript, which also breaks the web. It’s a disaster.
lytedev · 9h ago
I'm using a nearly default configuration which seems to not have this problem. curl still works and so do downloads.
I guess if your cookie expired at just the right time that could cause this issue, and that might be worth thinking about, but I think "breaks the web" is overstating it a bit, at least for the default configuration.
ziddoap · 5h ago
I feel like it's much more reasonable to blame the companies & people that are making it a necessity to have some sort of protection like Anubis for ruining the web (over-aggressive scrapers, bot farms, etc.), rather than blaming Anubis.
vachina · 12h ago
It’s not Anubis that saved your website, literally any sort of Captcha, or some dumb modal with a button to click into the real contents would’ve worked.
These crawlers are designed to work on 99% of hosts, if you tweak your site just so slightly out of spec, these bots wouldn’t know what to do.
boreq · 6h ago
So what you are saying is that it's anubis that saved their website.
forty · 12h ago
Anubis is nice, but could we have a PoW system integrated in protocols (http or TLS, I'm not sure) so we don't have to require JS ?
fc417fc802 · 9h ago
Protocol is the wrong level. Integrate with the browser. Add a PoW challenge header to the HTTP response, receive a POW solution header with the next request.
forty · 7h ago
I think you've just described a protocol ;)
Yes it could be in higher layer than what I suggested indeed, on top of HTTP sounds good to me.
My rule of thumb is that it should work with curl (which makes it not antibots, but just anti scrapper & ddos, which is what I have a problem with)
fc417fc802 · 7h ago
Ah yeah sloppy wording on my part. I think it should ideally be its own protocol built on top as opposed to integrated into an existing one. Integration is good but mandatory complexity and tight coupling not so much.
selfhoster11 · 5h ago
I'd much prefer for this to be standardised rather than an ad-hoc layer on top of what we have. Our protocols are already complex, and at least what we would be doing is moving that complexity somewhere where it can be handled more conveniently.
fc417fc802 · 23m ago
It would still be standardized. Anyone who wanted to support it would. Those who didn't want to support it wouldn't be burdened. And it could then evolve on its own, gaining variants for layering it on additional underlying protocols.
It's basic separation of responsibilities. It's helpful for reuse but also innovation. For example, the auth scheme baked in to HTTP is pretty much stuck in time and not very useful. We'd likely be better off if it wasn't tightly coupled to something unrelated like that. If I were implementing an HTTP stack I'd want to omit it, but that would make me noncompliant.
tpool · 17h ago
It's so bad we're going to the old gods for help now. :)
Hamuko · 12h ago
I’d sic Yogg-Saron on these scrapers if I could.
gitroom · 15h ago
Kinda love how deep this gets into the whole social contract side of open source. Honestly, it's been a pain figuring out what feels right when folks mix legal rules and personal asks.
lytedev · 14h ago
Yeah I had no idea that some folks would get so passionate about making changes to a piece of FOSS based on a request on a certain footer-esque documentation page.
I think its a great discussion though that gets to the heart of open source and software freedom and how that can seem orthogonal to business needs depending on how you squint.
KronisLV · 8h ago
> We use a stack consisting of Apache2, PHP-FPM, and MariaDB to host the web applications.
Oh hey, that’s a pretty utilitarian stack and I’m happy to see MariaDB be used out there.
Anubis is also really cool, I do imagine that proof of work might become more prevalent in the future to deal with the sheer amount of bots and bad actors (shame that they exist) out there, albeit in the case of hijacked devices it might just slow them down, hopefully to a manageable degree, instead of IP banning them altogether.
I do wonder if we’ll ever see HTTP only versions of PoW too, not just JS based options, though that might need to be a web standard or something.
qiu3344 · 9h ago
As someone who has a lot of experience with (not AI related) web scraping, fingerprinting and WAFs, I really like what Anubis is doing.
Amazon, Akamai, Kasada and other big players in the WAF/Antibot industry will charge you millions for the illusion of protection and half-baked javascript fingerprint collectors.
They usually calculate how "legit" your request is based on ambiguous factors, like the vendor name of your GPU (good luck buying flight tickets in a VM) or how anti-aliasing is implemented on you fonts/canvas. Total bullshit. Most web scrapers know how to bypass it. Especially the malicious ones.
But the biggest reason why I'm against these kind of systems is how they support the browser mono-culture. Your UA is from Servo or Ladybird? You're out of luck.
That's why the idea choosing a purely browser-agnostic way of "weighting the soul" of a request resonates highly with me.
Keep up the good work!
xena · 8h ago
Thanks! I'm going out of my way to make sure smaller browsers like Pale Moon aren't locked out when I add reputation into the equation. One of my prototypes that would work in concert with other changes works in links too :)
pmlnr · 6h ago
Anyone knows a solution that works without js?
ximm · 6h ago
Client must provide a proof-of-work. There is no standard for that, so the only way is to implement the client-side code in javascript.
It would be great if there was a standard for that so that all kinds of clients knew how to provide a proof of work, e.g. like this:
Where sha256(abcXYZ) would have to start with at least 5 zeros.
some_furry · 1h ago
Write an RFC draft, toss it at the IETF.
Seriously.
prmoustache · 3h ago
I was thinking about adding a link to a page that is hidden in a one pixel image and same color as the page background. hiting it would mean a rule would be added on the firewall to ban that ip for a few weeks.
The only is issue I can think of is there may be browsers or browser extensions that preload links to show thumbnails and users might be banned without knowing why.
I don’t really understand why this solved this particular problem. The post says:
> As an attacker with stupid bots, you’ll never get through. As an attacker with clever bots, you’ll end up exhausting your own resources.
But the attack was clearly from
a botnet, so the attacker isn’t paying for the resources consumed. Why don’t the zombie machines just spend the extra couple seconds to solve the PoW (at which point, they would apparently be exempt for a week and would be able to continue the attack)? Is it just that these particular bots were too dumb?
judge2020 · 18h ago
Anubis is new, so there may not have been foresight to implement a solver to get around it. Also, I wouldn't be surprised if the botnet actor is using vended software, not making it themselves to where they could quickly implement a solver to continue their attack.
maeln · 12h ago
Most DDoS bot don't bother running JS. A lot of botnets don't even really allow it, because the malware they run on the infected target only allow for basic stuff like simple HTTP request. This is why they often do some reconnaissance to find pages that take a long time to load, and therefore are probably using a lot of I/O and/or CPU time on the target server. Then they just spam the request.
Huge botnet don't even bother with all that, they just kill you with the bandwidth.
cbarrick · 13h ago
I think the explanation "you’ll end up exhausting your own resources" is wrong for this case. I think you are correct that the bots are simply too dumb.
The likely explanation is that the bots are just curling the expensive URLs without a proper JavaScript engine to solve the challenge.
E.g. if I hack a bunch of routers around the world to act as my botnet, I probably wouldn't have enough storage to install Chrome or Selenium. The lightweight solution is just to use curl/wget (which may be pre-installed) or netcat/telnet.
Tiberium · 19h ago
From looking at some of the rules like https://github.com/TecharoHQ/anubis/blob/main/data/bots/head... it seems that Anubis explicitly punishes bots that are "honest" about their user agent - I might be missing something, but isn't this just pressuring anyone who does anything bot-related to just lie about their user agent?
Flat out user-agent blacklist seems really weird, it's going to reward the companies that are more unethical in their scraping practices than the ones who report their user agent truthfully. From the repo it also seems like all the AI crawlers are also DENY, which, again, would reward AI companies that don't disclose their identity in the user agent.
userbinator · 19h ago
User-agent header is basically useless at this point. It's trivial to set it to whatever you want, and all it does is help the browser incumbents.
Tiberium · 19h ago
You're right, that's why I'm questioning the reason Anubis implemented it this way. Lots of big AI companies are at least honest about their crawlers and have proper user agents (which Anubis outright blocks). So "unethical" companies who change the user-agent to something normal will have an advantage with the way Anubis is currently set up by default.
I'm aware that end users can modify the rules, but in reality most will just use the defaults.
xena · 18h ago
Shitty heuristics buy time to gather data and make better heuristics.
MillironX · 16h ago
Despite broadcasting their user agents properly, the AI companies ignore robots.txt and still waste my server resources. So yeah, the dishonest botnets will have an advantage, but I don't give swindlers a pass just because they rob me to my face. I'm okay with defaults that punish all bots.
goku12 · 15h ago
You can have a bot allow list. I think it's also being planned as a subscription service (not sure about this part).
jeroenhd · 11h ago
From what I can tell from the author's Mastodon, it seems like they're working on a fingerprinting solution to catch these fake bots in an upcoming version based on some passively observed behaviour.
And, of course, the link just shows the default behaviour. Website admins can change them to their needs.
I'm sure there will be workarounds (like that version of curl that has its HTTP stack replaced by Chrome's) but things are ever moving forward.
wzdd · 13h ago
The point of anubis is to make scraping unprofitable by forcing bots to solve a sha256-based proof-of-work captcha, so another point of view is that the explicit denylist is actually saving those bot authors time and/or money.
EugeneOZ · 13h ago
The point is to reduce the server load produced by bots.
Honest AI scrapers use the information to learn, which increases their value, and the owner of the scraped server has to pay for it, getting nothing back — there's nothing honest about it.
Search engines give you visitors, AI spiders only take your money.
ranger_danger · 21h ago
Seems like rate-limiting expensive pages would be much easier and less invasive. Also caching...
And I would argue Anubis does nothing to stop real DDoS attacks that just indiscriminately blast sites with tens of gbps of traffic at once from many different IPs.
PaulDavisThe1st · 20h ago
In the last two months, ardour.org's instance of fail2ban has blocked more than 1.2M distinct IP addresses that were trawling our git repo using http instead of just fetching the goddam repository.
We shut down the website/http frontend to our git repo. There are still 20k distinct IP addresses per day hitting up a site that issues NOTHING but 404 errors.
felsqualle · 15h ago
Hi, author here.
Caching is already enabled, but this doesn’t work for the highly dynamic parts of the site like version history and looking for recent changes.
And yes, it doesn’t work for volumetric attacks with tens of gbps. At this point I don’t think it is a targeted attack, probably a crawler gone really wild. But for this pattern, it simply works.
GoblinSlayer · 7h ago
There's a theory they didn't get through, because it's a new protection method and the bots don't run javascript. It could be as simple as <script>setCookie("letmein=1");reload();</script>
Ocha · 21h ago
Rate limit according to what? It was 35k residential IPs. Rate limit would end up keeping real users out.
linsomniac · 19h ago
Rate limit according to destination URL (the expensive ones), not source IP.
If you have expensive URLs that you can't serve more than, say 3 of at a time, or 100 of per minute, NOT rate limiting them will end up keeping real users out simply because of the lack of resources.
danielheath · 19h ago
Right - but if you have, say, 1000 real user requests for those endpoints daily, and thirty million bot requests for those endpoints, the practical upshot of this approach is that none of the real users get to access that endpoint.
Groxx · 17h ago
Yeah, at that point to might as well just turn off the servers. It's even cheaper at cutting off requests, and it'll serve just as many legitimate users.
EugeneOZ · 13h ago
No, it's not equal. These URLs might not be critical for users — they can still browse other parts of the site.
If rate limiting is implemented for, let’s say, 3% of URLs, then 97% of the website will still be usable during a DoS attack.
danielheath · 6h ago
Right, but in terms of users ability to access those 3%, you might as well disable those endpoints entirely instead of rate limiting - much easier to implement, and has essentially the same effect on the availability of the endpoints to users.
pluto_modadic · 19h ago
this feels like something /you can do on your servers/, and that other folks with resource constraints (like time, budget, or the hardware they have) find anubis valuable.
bastawhiz · 21h ago
Rate limiting does nothing when your adversary has hundreds or even thousands of IPs. It's trivial to pay for residential proxies.
supportengineer · 20h ago
Why aren't there any authorities going after this problem?
danielheath · 19h ago
Most of the "free" analytics tools for android/iOS are "funded" by running residential / "real user" proxies.
They wait until your phone is on wifi / battery, then make requests on behalf of whoever has paid the analytics firm for access to 'their' residential IP pool.
nicce · 18h ago
Do you happen to have any link for blog/something about this?
Unit A, 82 James Carter Road, Mildenhall, Suffolk, IP28 7DE, United Kingdom
okanat · 11h ago
1. Which authorities?
2. The US is currently broken and they are not going to punish only, albeit unsustainable, growth in their economy.
3. Internet is global. Even EU wants to regulate, will they charge big tech leaders and companies with information tech crimes which will pierce the corporate veil? It will ensure that nobody will invest in unsustainable AI growth in the EU. However fucking up economy and the planet is how the world operates now, and without infinite growth you lose buying power for everything. So everybody else will continue to do fuckery.
4. What can a regulating body do? Force disconnects for large swaths of internet? Then Internet is no more.
folmar · 10h ago
I would go for making the AI companies pay. Identifying end users for other abuse works but there are problems on state borders, for monetary solutions it should be easier.
o11c · 19h ago
Because in a "free" nation, that means "free to run malware" not "free from malware".
By far most malware is legal and a portion of its income is used to fund election campaigns.
eikenberry · 20h ago
They could be doing it legally.
toast0 · 14h ago
> And I would argue Anubis does nothing to stop real DDoS attacks that just indiscriminately blast sites with tens of gbps of traffic at once from many different IPs.
Volumetric DDoS and application layer DDoS are both real, but volumetric DDoS doesn't have an opportunity for cute pictures. You really just need a big enough inbound connection and then typically drop inbound UDP and/or IP fragments and turn off http/3. If you're lucky, you can convince your upstream to filter out UDP for you, which gives you more effective bandwidth.
lousken · 20h ago
Yes, have everything static (if you can't, use caching), optimize images, rate limit anything you have to generate dynamically
rubyn00bie · 16h ago
Sort of tangential but I’m surprised folks are still using Apache all these years later. Is there a certain language that makes it better than Nginx? Or it just the ease of use configuration that still pulls people? I switched to Nginx I don’t even know how many years ago and never looked back, just more or less wondering if I should.
mrweasel · 13h ago
Apache does everything, it's fairly easy to configure. If there's something you want to do, Apache mostly knows how, or have a module.
If you run a fleet of servers, all doing different things, Apache is a good choice because all the various uses are going to be supported. It might not be the best choice in each individual case, but it is the one that works in all of them.
I don't know why some are so quick to write off Apache. Is just because it's old? It's still something like the second most used webserver in the world.
anotherevan · 15h ago
Equally tangential, but I switched form Nginx to Caddy a few years ago and never looked back.
ahofmann · 14h ago
I'm using nginx since what feels like decades and occasionally I miss the ability to use .htaccess files. This is a very nice way to configure stuff on a server.
forinti · 7h ago
Apache has so much functionality. Why wouldn't anybody use it?
I started using it when Oracle's Webcache wouldn't support newer certificates and I had to keep Oracle Portal running. I could edit the incoming certificate (I had to snip the header and the footer) and put it in a specific header for Portal to accept it.
felsqualle · 13h ago
I use it because that’s the one I’m most familiar with. Using it since 15 years and counting. And since it doesn’t the job for me, I never had the urge to look into alternatives.
Hey, funny to see my project mentioned here also. Yes, similar in concept.
Some differences:
- Uses HAProxy (duh)
- Proof of work can be either sha256 or argon2
- Optional recaptcha/hcaptcha in addition to the proof of work
- Includes a script for your page that will re-solve the challenge in the background before the cookie expires
There's also a control panel, dns server, etc. I kinda built my own everything because I refused to use bunny/cloudflare/whatever.
One thing I will say though, is that proof-of-work alone isn't a solution for ddos mitigation and bot protection! I've seen attackers using a mass of proxies and headless browsers to solve the challenge, or even writing code to extract and solve the challenge directly (https://github.com/lizthegrey/tor-fetcher). To adequately protect against more targeted attacks, you need additional acl and heuristics, browser fingerprinting, tls fingerprinting, ip reputation, etc. I do offer the whole thing setup as a commercial service, but will refrain from too much shilling.
It's fun, and I love seeing similar softwares help fight the horde of AI scrapers :^)
herpdyderp · 21h ago
Can Anubis be restyled to be more... professional? I like the playfulness, but I know at least some of my clients will not.
samhclark · 21h ago
You can, but they ask that you contact them to set up a contract. It's addressed here on the site:
>Anubis is provided to the public for free in order to help advance the common good. In return, we ask (but not demand, these are words on the internet, not word of law) that you not remove the Anubis character from your deployment.
>If you want to run an unbranded or white-label version of Anubis, please contact Xe to arrange a contract.
Thanks for the information. Just to confirm, with the stock deployment it is not possible to remove the character, but there is an option to set the interface language for users? Spanish is supported?
xena · 14h ago
I think the project is now mature enough for i18n, I've been putting it off because adding it ossifies a lot of the design but I think it's ready now.
lytedev · 20h ago
My "workaround" for this MIT-licensed software that does not allow me a simple and common customization was to have my reverse proxy redirect requests to the images. https://git.lyte.dev/lytedev/nix/pulls/92/files
Hope this is useful to others!
willriches · 20h ago
If you're going to break the social contract, just do so. Jumping through hoops to complicate the matter doesn't solve anything.
lytedev · 20h ago
I did so, though I would hardly call using MIT FOSS for my personal projects a breach of the social contract of open source. This was easier than forking, building a docker image, etc. I'm guessing it will be much easier for others, too, since the recommended config has you dink around with reverse proxy configuration no matter what.
idle_zealot · 17h ago
You are breaking the social contract of the project, not the legal one. The MIT license is the legal contract. The additional social contract is established by the author asking (without legal teeth) that you not do exactly what you did by removing the branding.
Compare to a take-a-penny-leave-a-penny tray from an era past. You are legally allowed to scoop up all the pennies into a bag, and leave the store, then repeat at the neighboring store, and make a few bucks. You'd be an asshole, but not face legal trouble. You "followed the rules" to the letter. But guess what? If you publish an easy how-to guide with "one weird trick" for making some quick cash, and people start adopting your antisocial behavior and emptying out change trays, you've forced the issue and now either a) businesses will stop offering this convenience or b) the rules around it will be tightened and the utility will be degraded. In the concrete case of Anubis, the maintainers may decide to stop contributing their time to this useful software or place a non-FOSS license on it in an attempt to stop gain-maximizing sociopaths from exploiting their efforts.
I even it out by how I prioritize feature requests, bug reports, and the like :)
lytedev · 15h ago
I'm surprised to read this from you, somebody I and many others hold in high regard as accepting and knowledgeable, insulting someone's character because they didn't like some specific aspect of your work or opinions or chose to ignore an ask in this particular use case.
I didn't implement this out of fear or some lack of courage. In fact I had the original avatars up for quite a while. I simply wanted my own logo so visitors wouldn't be potentially confused. It seemed to fit the use case and there was no way to achieve what I wanted without reaching out. I didn't feel comfortable bugging you or anybody on account of my tiny little no-traffic git forge even though, yes, that is what you politely asked for (and did not demand).
I think if you do feel this strongly you might consider changing the software's license or the phrasing of the request in the documentation. Or perhaps making it very clear that no matter how small, you want to be reached out to for the whitelabel version.
I think the success story of Anubis has been awesome to read about and follow and seeing how things unfold further will be fun to watch and possibly even contribute to. I'm personally rooting for you and your project!
lytedev · 15h ago
You are correct in that I ignored a specific request, but you are also ignoring the larger social contract of open source that is also at play. To release software with a certain license has a social component of its own that seems to be unaccounted for here.
Your analogy to me seems imprecise, as analogies tend to be when it comes to digital goods. I'm not taking pennies in any sense here, preventing the next person from making use of some public good.
You can make a similar argument for piracy or open source, and yet... Here we all still are and open source has won for the most part.
CaptainFever · 11h ago
I think back to the original idea of free software.
The GPL protects users from any restrictions the author wants to use. No additional restrictions are allowed, whether technical or legal.
In this case, the restriction is social, but is a restriction nonetheless (some enforce it by harassment, some by making you feel bad).
But you could ignore it, even fork it and create a white label version, and be proud of it (thereby bypassing the restriction). Donate voluntarily if you want to contribute, without being restricted technically, legally, or socially.
lytedev · 9h ago
I agree with your comment here, and would add that I believe the license and open source in general has a certain social restriction as well and implies how the software may or may not be used, which is part of what makes this discussion nuanced and difficult, as it appears there are two true and opposing points.
jezek2 · 12h ago
And the author is breaking a social contract of not shoving stuff I don't want to see in an excessive amount (or being a contributor of it). Before I wouldn't mind to see some anime here or there, it's quite cute for most people. But lately I see it in much more places and more aggressive.
Some project even took it to the next level and displayed a furry porn. I think anime and furry graphics are related, esp. in the weird obsession of the people to shove it to the unsuspecting people, but since it's "cute" it's passable. Well unless it gets into the porn territory.
On the other hand I applaud the author for an interesting variation of making the free product slightly degraded so people are incentived to donate money. The power of defaults and their misuse.
Personally I'm not fan of enshittification of any kind even a slight one even when it's to my own detriment.
pjerem · 10h ago
> And the author is breaking a social contract of not shoving stuff I don't want to see in an excessive amount.
Except the author is not shoving any stuff at you. Author doesn't owe anything to you and can do whatever they want and you doesn't owe the author the obligation to use their software.
It's not business, it's a person giving something free to the world and asking people who uses it to play the game. You can chose to not play the game or to not use it, but you can't act like your issue with an anime character is the author's fault. Just don't install it on your server and go ahead.
jezek2 · 8h ago
Not directly. But he knows it will get used in the current unfortunate landscape and that people will put it in front of their web pages. Then as a visitor of these pages I'm forced to see it. So yes indirectly he is shoving this stuff at the people.
some_furry · 55m ago
> Not directly. But he knows
Are you sure you have the right pronouns for Xe?
Philpax · 8h ago
> Some project even took it to the next level and displayed a furry porn. I think anime and furry graphics are related, esp. in the weird obsession of the people to shove it to the unsuspecting people, but since it's "cute" it's passable. Well unless it gets into the porn territory.
This is your weird association and hang-up. That's on you to deal with, not Anubis or the rest of the internet.
otterley · 19h ago
This is a very innovative way to earn a living with open source! Make the free version sickeningly cutesy (no offense to the author intended), and charge for the professional-looking version. No change in functionality, just chrome.
xena · 19h ago
I am actually working on changing functionality for paid customers, it's just access to a bigger database of default rules and IP reputation tracking.
otterley · 19h ago
I wish you best of luck! You're a very talented developer and artist. I'd be thrilled to work with you someday.
xena · 16h ago
Thanks! I'll be sure to post through it either way. My failure condition is going back to work somewhere else, so worst case it'll be more likely to happen :)
Really though my dayjob kinda burns me out because I have to focus on AEO, which is SEO but for AI. I get by making and writing about cool things, but damn does it hurt having to write for machines instead of humans.
No comments yet
LPisGood · 21h ago
I’ve heard people say that before. They would love to use it if there wasn’t a playful animated character.
The code is open source, so I can’t imagine making a fork to remove that is a Herculean effort.
unsnap_biceps · 21h ago
When I last looked into it, they are planning a white label service to customize the look and has been requesting folks to not fork and modify the images.
> Regardless, Xe did ask nicely to not change out the images shipped as a whitelabel service is planned in the future
I've soft launched the commercial offering and I'm working on expanding the commercial features before I announce it more publicly. If you pay $50 a month on GitHub sponsors, you get access to BotStopper complete with custom CSS support. You'll also get access to the reputation database I'm working on named hivemind.
yjftsjthsd-h · 19h ago
> You'll also get access to the reputation database I'm working on named hivemind.
That feels uncomfortably close to returning to the privacy-and-CGNAT-hating embrace of cloudflare et al.
No comments yet
ketzo · 19h ago
> reputation database I'm working on named hivemind.
Anywhere I can read more about this? Sounds super interesting, and a cursory search didn’t show anything for it on your site.
Otherwise I’m sure I’ll hear about it soon anyway, at the rate Anubis is going!
xena · 19h ago
I'd be happy to talk about it if it existed, I'm still working out the details. But the basic idea is to take advantage of the fact that Anubis is a very popular project from what I've seen with logs that server admin have submitted the same IP blocks and the like hit instances of Anubis so some kind of IP reputation thing would work for this.
I am also working on some noJS checks, but I want to test them with paid customers in order to let a thousand flowers bloom.
pabs3 · 13h ago
That sounds a bit like what crowdsec does for SSH.
Cool. Good luck on both that and Anubis generally — seems like you’ve found something that’s both a meaningful benefit to the common good AND could maybe make a buncha money, or at least enough to pay for development, which is awesome.
xena · 17h ago
Thanks! There's a lot of really hard problems to solve and most of them hinge around trust. I usually default into solving trust by making things open, but security software needs a bit of cloak and dagger by necessity. I'll find a balance I'm sure, but it's an annoying thing to balance.
LPisGood · 20h ago
That’s the beautiful thing about open source, they ask but do not demand.
Of course, if you use this service for your enterprise, the Right Thing To Do would be support the excellent project financially, but this is by no means required.
If you want to use this project on your site and don’t like the logo, you are free to change it. If the site is personal and this project is not something you would spend money on, I don’t even think it is unethical to change the image.
altairprime · 20h ago
Seems pretty unethical to me. Exercising a liberty in direct contradiction to its creator’s wishes for personal gain with no recompense to them is about as crassly selfish and non-prosocial as it gets. Perhaps your ethics don’t include “being prosocial towards those whose work benefits you”? That’s the usual difference I encounter between my ethics and those who disagree that it’s crass — and I do respect such differing beliefs.
Note that I’m not faulting you for behaving this way, no insult or disparagement intended, etc.! Open source inherited this dissonance between giving it all away to anyone who asks for free, and giving nothing of yours back in return because prosocial is not an ethical standard, from its predecessor belief system. It remains unsolved decades later, in both open source and libertarianism, and I certainly don’t hold generic exploiters of the prosocial-imbalance defect accountable for the underlying flaw in both belief systems.
LPisGood · 19h ago
If the authors wanted to disallow people to be free (as in freedom) to change the source code for free (as in beer), then the authors had every chance to publish the source code under a more restrictive license.
I’m trying to imagine how this might be unethical. The only scenario I can think of is if the authors wanted the code to not be modified in certain ways, but felt based on more deeply held principles that the code should be made FOSS. But I struggle to see how both ideas could exist simultaneously - if you think code should be free then you think there is no ethical issue with people modifying it to fit their use.
altairprime · 19h ago
Yep, that’s the struggle in a nutshell!
If you believe in giving away code because that’s open-source prosocial, then open-source adherents will claim that taking advantage of you is ethical, because if you didn’t want to be exploited, you shouldn’t have been open-source prosocial in the first place. And by treating “pay me if you get paid for my code” licenses as treated as evil and shameful, exploiters place pressures on prosocial maintainers into adopting open source licenses, even though they’ll then be exploited by people who don’t care about being prosocial, eventually burning out the maintainer who either silent-quits or rage-quits.
Of course, if OSI signed off on “if you get rich from my source code you have to share some of that wealth back to me” as a permissible form of clause in open source licensing, that would of course break the maintainer burnout cycle — but I’m certainly not holding my breath.
blackoil · 17h ago
That only applies if author wants to call software "Open Source". You can license it under "SourceAvailableForSmallGuy" with no resistance.
lytedev · 9h ago
I think there will be at least some resistance to any license that isn't largely unrestricted.
But I do agree that this is the crux of the issue.
fc417fc802 · 8h ago
> treating “pay me if you get paid for my code” licenses as treated as evil and shameful
Blatantly untrue. Companies riding the coattails of the opensource moniker for PR points while using restrictive licenses is what garners all the hate. It's essentially fraud committed to garner good press.
The other thing that gets people riled up is companies with a CLA that they claim is for responsible stewardship suddenly pulling a fast one and relicensing the project to a non-OSI license. It's perfectly legal but it tends to upset people.
There's absolutely nothing wrong with source available software at any level of restriction. Just be very clear about what it is and isn't.
nkrisc · 9h ago
Removing some stupid cartoon character is hardly a huge ethical violation, despite their wishes.
Sure, you can say it’s unethical in that it directly contravenes their request - I won’t argue that - but it’s the smallest of violations.
As far as I can see it’s MIT licensed so you have no legal obligation otherwise. If they truly cared about people keeping the character, they should have made the request with teeth.
I don’t even understand why they made the request in the first place. The nature of the request makes it seem as though it isn’t actually important at all, so why make the request at all? It just puts everyone else in an uncomfortable position. If keeping the character is important, then why release it under MIT license?
sgc · 19h ago
You are presuming this is their primary concern. Releasing software with a permissive license is a pretty strong signal you are ok with people not doing exactly as you ask.
altairprime · 18h ago
It’s certainly a legal signal, insofar as once you have that signal, you have the ability to make a legally-sound decision on usage — but I don’t presume that it’s in any way an indication of how strongly the author is or isn’t invested in whatever license they chose. Unless accompanied by something written by the maintainer, the only certain statement is that the maintainer released with a metadata attribute set to a value; nothing more.
The purpose of a software license is to codify the rights the author grants to its users. The author can't claim to use a free software license, while also making separate demands about how the software can be used. These demands should either be part of the license, or removed altogether. This moral shaming for breaking a "social contract" is ridiculous. The software is either free or not. You can't have it both ways.
altairprime · 12h ago
“Don’t use this for evil” is a legal and valid software license. This is anathema to programmers and law-as-code adherents, but it’s perfectly acceptable to bring to a court of law in a licensing dispute. Different courts and different acts of accused evil will result in different judgments. It would be very difficult for a corporation to accept that license; it would be very simple for an individual to do so.
Such a license does not comply with your requirements; yet, it is also valid under case law, even if it is statistically unlikely to permit enforcement against most claimed evils. Each society has certain evils that are widely accepted by the courts, so it certainly isn’t a get out of all possible jails free card.
The purpose of a license is to inform of the rights available. The user is responsible for evaluating the license, or for trusting the judgment of a third party if they are uninterested in evaluating themselves.
If the author’s entire license “This is free software for free uses, please contact me for a paid license for paid uses” then that is statistically likely to be court enforceable against exploitation, so long as the terms offered are reasonable to the judge and any expert witnesses called. The Free Software Foundation does not have exclusive rights to the words “free software”. Adoption will be much reduced for someone who writes such a license, of course, and perhaps someone will exploit a loophole that a lengthier outsourced license would have covered. Neither of those outcomes are necessarily worth the time and effort to try and prevent, especially when use of any open source license guarantees the right of exploitation for unshared profit in plain language versus the homegrown one which does not.
(I am not your lawyer, this is not legal advice.)
imiric · 12h ago
This is not a legal matter, nor is it related to the FSF and any of the "open source" licenses. My argument is philosophical.
Using a license that allows the software to be distributed and modified, while placing restrictions or exemptions to those permissions outside of the license, at the very least sends mixed signals. My point is that if the author wants to make those restrictions, that's fine, but the license is the correct place for it. What's shitty from my moral perspective is using a commonly accepted free software license for marketing purposes, but then judging people for not following some arbitrary demands. If anything, _that_ is the unethical behavior.
sgc · 7h ago
I completely agree with you. I just want to point out that the actual software author here is not being aggressive about it. They make a request and that's it. Nor are the other 55 contributors visible on github.
"we ask (but not demand, these are words on the internet, not word of law) that you not remove the Anubis character from your deployment"
For whatever reason somebody decided to blow it out of proportion here on hn.
imiric · 6h ago
Well, sure, but the author is also labeling people who don't comply with their request as "cowards" in this very thread. So by the same token that they kindly make a request, they can also refrain from passing judgment on people who kindly don't comply. And the same goes for people who pass their judgment on the author's behalf, or make a point about some "social contract".
sgc · 3h ago
Sure, I can agree with that too. Although I think 3rd party outrage is much worse, and borderline infantile.
imiric · 13h ago
> Seems pretty unethical to me. Exercising a liberty in direct contradiction to its creator’s wishes for personal gain with no recompense to them is about as crassly selfish and non-prosocial as it gets.
You're ignoring the possibility that users of the software might not agree with the author's wishes. There's nothing unethical about that.
A request to not change a part of the software is the same as a request to not use the software in specific industries, or for a specific purpose. There are many projects that latch on open source for brand recognition, but then "forbid" the software to be used in certain countries, by military agencies, etc. If the author wants to restrict how the software can be used, then it's not libre software.
altairprime · 12h ago
I disagree. Having the freedom to choose to ignore someone’s wishes does not necessarily make it ethical to exercise that freedom. Ethics are not as simple as “what is not prohibited is therefore ethical”.
CaptainFever · 11h ago
Ethics is also not as simple as "the author's wishes are always to be respected". For instance, free software was built on the ethical principle that restrictions on users' four fundamental freedoms (whether that be legal, technical, or in this case social), by IP holders, are unethical. This justifies piracy, and definitely justifies breaking this request.
I don't believe it is possible to reconcile these ethical views, as a ethical subjectivist.
fc417fc802 · 8h ago
I think there might be cases where the ethical thing to do would be to respect an author's non-binding request. However the request in this case seems directly contradictory to the principles of open source software and thus I can't bring myself to see it as legitimate.
Edit to add, an example of a non-contradictory request might be to contribute monetarily in proportion to the financial benefit you derive. It's an additional non-binding request to help sustain the community which seems reasonably consistent with the ethos of opensource to me.
The issue is that opensource is a movement that comes with a set of values attached. The licenses aren't impersonal the way the copyright system at large is.
jillyboel · 19h ago
The license explicitly allows you to make such changes. They could have picked a different license, but didn't.
> Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software
altairprime · 18h ago
> They could have picked a different license, but didn’t.
I disagree.
Licenses that prohibit exploitation of source code for personal reward are treated with hostility, shame, and boycotts — claiming that to restrict in any way the liberty of another person to exploit one’s work is unethical. Human beings are social creatures, and most human beings are not asocial with decoupled ethical systems like myself; so, given the social pressures in play, few human beings truly have the liberty to pick another license and endure the shame and vitriol that exercising that freedom earns from us.
lytedev · 8h ago
I don't think its fully correct that social pressure means that permissive licenses are no longer meaningful when it comes to the ethics or sociology of open source software.
Since the original subject is also about swapping out the imagery, it's also difficult to take your argument too seriously as the term "exploit" is doing a lot of heavy lifting for your argument.
I will also add that the social and ethical component goes both ways: is it ethical to knowingly give something away freely and without restriction and then immediately attempt to impose restrictions through a purely social mechanism? I would say so as long as your expectation is that some might politely decline.
Or worse, some may respond with the same vitriol and then we're at your original point, which doesn't seem to be preventing such an approach here, making me doubt your hypothesis.
fc417fc802 · 8h ago
> Licenses that prohibit exploitation of source code for personal reward are treated with hostility, shame, and boycotts
I'd have to disagree. However let's just run with it because your subsequent reasoning doesn't seem consistent to me.
If you do A you'll be met with hostility. So instead you do B, but then you add a request "actually please abide by A" and somehow this is supposed to not be met with hostility? You can't have it both ways. B but with an addendum that makes it A is just A wearing a mask. Changing the name doesn't change the thing.
lelanthran · 13h ago
> Seems pretty unethical to me.
I'm seeing this sentiment multiple times on this thread - "fine, it's legal, but it's still wrong!"
That's an extremely disrespectful take on someone adhering to a contract that both parties agreed to. You are using shaming language to pressure people into following your own belief system.
In this specific instance, the author could have chosen any damn license they wanted to. They didn't. They chose one to get the most adoption.
You appear to want both:
1. Widespread adoption
and
2. Restrict what others can do.
The MIT license is not compatible with #2 above. You can ask nicely, but if you don't get what you want you don't get to jump on a fucking high horse and religiously judge others using your own belief system.
Author should have used GPL (so any replaced images get upstreamed back and thus he has control) OR some other proprietary license that prevents modifications like changing the image.
A bunch of finger-pointers gabbing on forums about those "evil" people who stick to both the word and the spirit of the license are nothing more than the modern day equivalent of witch-hunters using "intent" to secure a prosecution.
Be better than that - don't join the mob in pointing out witches. We don't need more puritans.
pabs3 · 13h ago
The LGPL/GPL/AGPL family of licenses don't require upstreaming, only passing source code downstream to end users.
In this case upstreaming replaced images wouldn't be useful to the author anyway, they are going to keep the anime image.
lelanthran · 13h ago
> In this case upstreaming replaced images wouldn't be useful to the author anyway, they are going to keep the anime image.
In this case, it would be, because (presumably) the new images are the property of the user, and they would hardly want (for example) their company logo to be accidentally GPL'ed.
altairprime · 13h ago
I do not agree with your position that two parties who enter into a contract are no longer subject to ethical judgment by others. Contract law does not invalidate ethics, no matter how appealing it is to opt out of them. As one of the asocial / decoupled people who has no social compulsion whatsoever, I voluntarily opt-in to preferring prosocial outcomes and typically deem anti-prosocial actions unethical even if our society currently accepts them.
For example, if an employee does something hostile towards society at their employer when they have the freedom to choose not to do so — and since employment is at will, they always have that freedom to choose — I will tend to judge their antisocial actions unethical, even if their contract allows it. (This doesn’t mean I will therefore judge the person as unethical! One instance does not a pattern make, etc.)
So, for me, ethical judgments are not opt-out under any circumstance, nor can they be abrogated by contract or employment or law. I hold this is a non-negotiable position, so I will withdraw here; you’re welcome to continue persuading others if you wish.
lelanthran · 13h ago
> Contract law does not invalidate ethics, no matter how appealing it is to opt out of ethics
I didn't claim it does, I am claiming that since ethics is subjective and the contract is not, you subjecting your moral standard to others is no different than a mob subjecting an old woman to accusations of being a witch.
Now, you may not have a problem publicly judging others, but your actions are barely different from those of the Westboro Baptist Church.
IOW, sure, you are allowed to publicly condemn people who hold different moral beliefs to you, but the optics are not good for you.
Snow_Falls · 2h ago
You're using some really emotional language about what is really not such a huge issue. Maybe it's time to go offline for a while?
"no different than a mob subjecting an old woman to accusations of being a witch."
Well, you're not being driven out of your village or being executed...
Also the person you're replying to has beeing rather polite. Hardly a witch hunt is it?
"barely different from those of the Westboro Baptist Church"
The church that interrupts the grieving of the families of dead soldiers to shout about how much they hate gay people? You seriously believe that the person you're repling to is "barely different" from that?
"IOW, sure, you are allowed to publicly condemn people who hold different moral beliefs to you, but the optics are not good for you."
You're literally condeming them for having different moral beliefs than you right now, while being much more accusatory about it, comparing them to some really vile people. I wonder how you feel the optics of this reflects on you, because I don't think it's good for you.
Why are you so offended that someone might judge you for ignoring the friendly request of someone giving you something for free?
The author's quite reasonable and polite request to not change the appearance of the project is pretty straightforward so morally no, it cannot. Feel free to write your own version though. I hope I helped.
clvx · 18h ago
Just fork, change and move on.
If you like it, contribute back or pay some sponsorship.
If I see a cute cartoon with a cryptocurrency mining like "KHash/s" thing I am gonna leave that site real quick!
It should explain it isn't mining and just verifying the browser or such.
lytedev · 9h ago
It includes links with explanations, but the page does kind of "fly by" in many cases. At which point, would you still leave?
I'm guessing folks have seen enough captcha and CloudFlare verification pages to get a sense that they're being "soul" checked and that it's not an issue usability-wise.
The search engines always seemed happy to announce that they are in fact GoogleBot/BingBot/Yahoo/whatever and frequently provided you with their expected IP ranges. The modern companies, mostly AI companies, seems to be more interested in flying under the radar, and have less respect for the internet infrastructure at a whole. So we're now at a point where I can't tell if it's an ill willed DDoS attack or just shitty AI startup number 7 reloading training data.
I think that makes a lot of sense. Google's goal is (or perhaps used to be) providing a network of links. The more they scrape you, the more visitors you may end up receiving, and the better your website performs (monetarily, or just in terms of providing information to the world).
With AI companies, the goal is to consume and replace. In their best case scenario, your website will never receive a visitor again. You won't get anything in return for providing content to AI companies. That means there's no reason for website administrators to permit the good ones, especially for people who use subscriptions or ads to support their website operating costs.
I don’t think that’s really true. The AI companies’ goal is to consume and create something else.
> You won't get anything in return for providing content to AI companies.
That was the original problem with websites in general, and the ‘solution’ was ads. It would be really, really cool if the thing which finally makes micropayments happen is AI.
And then we humans could use micropayments too. Of course, the worst of both worlds would be micropayments and ads.
A lot of those sites are at risk of being made irrelevant by AI companies who really don't give a shit about your motivations for doing something for free. If their crawler kills your site and their LLM steals views by regurgitation answers based on your work, so be it, you served your purpose.
If you want to talk payment: Ask the AI companies to pay you when they generate an answer based on your work, a license fee. That will kill their business model pretty quickly.
Fair use is being abused big time by AI companies and search engines before that even
How is that different from a human being reading my underwater basket weaving site and starting his own, ‘stealing’ ‘my’ views? Or a thousand human beings out of the billions on Earth doing the same thing?
Sure, in either situation you could say "They trying to harm me using bullets," but one of them is much more likely to succeed, and we probably shouldn't treat the situations or costs to your well being as legally identical.
You're correct that there's not really anything stopping a person from ripping you of, tweaking your work just enough that it's not a copy right violation. Unless that person themselves have a really good grasp of the topic and can contribute it will become clear that they are getting the content else where and the readers will end up there in the end. Many, not all obviously, will also provide attribution, something LLMs rarely do.
Then you have the issue that the person publishing something on their own little server now has to deal with commercial companies just hammering their sites into the ground and they have to deal with that problem, just so someone can do an automated version of content theft?
A lot of things people could potentially do are minor issues, until it's automated and commercialized.
I have a personal blog. It's free. I write because I want humans to read my work, not because I want to provide a free labor to AI companies.
This argument doesn't work here.
AI scrapping bots provide zero value for sites owners.
Anubis is DDoS protection, just with updated marketing. These tools have existed forever, such as CloudFlare Challenges, or https://github.com/RuiSiang/PoW-Shield. Or HashCash.
I keep saying that Anubis really has nothing much to do with AI (e.g. some people might mistakenly think that it magically "blocks AI scrapers"; it only slows down abusive-rate visitors). It really only deals with DoS and DDoS.
I don't understand why people are using Anubis instead of all the other tools that already exist. Is it just marketing? Saying the right thing at the right time?
Anubis is getting real love out there and I think I am all for it. I personally host a lot of my stuff on cloudflare due to it being free with cloudflare workers but if I ever have a vps, I am probably going to use anubis as well
[1]: https://github.com/vaxerski/checkpoint
How can anyone provide a cryptographic challenge without javascript feels like black magic.
Can you please explain to me how it works without javascript?
Javascript might be better to run in scratchpad.
Care to share existing solutions that can be self-hosted ? (genuine question, I like how Anubis works, I just want something with a more neutral look and feel).
If it is perfect for your needs other than the look, you could update the superficial parts to match your liking?
If it is designed in such a way as to make this difficult, such as if the visible content & styling is tangled within the code rather than all in static assets (I've not looked at the code myself yet), then perhaps raise an issue suggesting that this is changed (or if you are a coder yourself, perhaps do so and raise a pull request for your changes).
Given how popular the tool seems to be coming, I expect theming this sort of theming will be an official feature eventually anyway, of you are patient.
Of course the technique it uses is well know and documented, so there may already be other good implementations that match your visual needs without any of the above effort.
CloudFlare doesn't do that. Cloudflare's false positive rate is extremely high, as are the others. Mostly because they all depend on bleeding edge JS and browser functions (CORS, etc) for fingerprinting functionality.
Cloudflare is for for-profit and other situations where you don't care if you block poor people because they can't give you money anyway. Anubis is for if you want everyone to be able to access your website.
btw it only works on AI scrapers because they're DDoSes.
One thing that I've noticed recently with the Arch Wiki adding Anubis, is that this one week period doesn't magically fix user annoyances with Anubis. I use Temporary Containers for every tab, which means that I constantly get Anubis regenerating tokens, since the cookie gets deleted as soon as the tab is closed.
Perhaps this is my own problem, but given the state of tracking on the internet, I do not feel it is an extremely out-of-the-ordinary circumstance to avoid saving cookies.
Unfortunately nobody has a good answer for how to deal with abusive users without catching well behaved but deliberately anonymous users in the crossfire, so it's just about finding the least bad solution for them.
A sufficiently advanced web scraper can build a statistical model of fingerprint payloads that are categorized by CF as legit and change their proxy on demand.
The only person who will end up blocked is the regular user.
There is also a huge market of proprietary anti-bot solvers, not to mention services that charge you per captcha-solution. Usually it's just someone who managed to crack the captcha and is generating the solutions automatically, since the response time is usually a few hundred milliseconds.
This is a problem with every commercial Anti-bot/captcha solution and not just CF, but also AWS WAF, Akamai, etc.
Uhh, that's not right. There is a good answer, but no turnkey solution yet.
The answer is making each request cost a certain amount of something from the person, and increased load by that person comes with increased cost on that person.
All the best,
-HG
No comments yet
No, cost is used in the fullest abstract meaning of the word here.
Time cost, effort cost, monetary cost, work cost, so long as there is a functional limitation that prevents resource exhaustion that is the point.
I use a certain online forum which sometimes makes users wait 60 or 900 seconds before they can post. It has prevented me from making contributions multiple times.
Cloudflare's checkbox challenge is probably the better challenge systems. Other security systems are far worse, requiring either something to be solved, or a more annoying action (eg. holding a button for 5 seconds).
The problem is when cloudflare doesn't let you through.
Don't use an unusual browser configuration then, like spoofing user-agents or whatever? If you're doing it for "privacy" reasons, it's likely counterproductive. The fact that cloudflare can detect it means that the spoofing isn't doing a very good job, and therefore you're making yourself more fingerprintable.
Examples?
Punishing people for not having Google cookies is probably the most obnoxious one.
For pure POW (no fingerprinting), mCaptcha is a nice drop-in replacement you can self-host: https://mcaptcha.org/
The issue I'm talking about is specifically how frustrating it is to hit yet another site that has switched to Anubis recently and having to enable cookies for it.
There's no real way to hide that you're visiting the site and clicking multiple pages during that visit, so I don't see what's so bad about accepting a first party cookie for an hour.
[1]: https://www.fixbrowser.org/blog/fixproxy
Is that why it now shows that annoying slow to load prompt before giving me the content I searched for?
[1] https://anubis.techaro.lol/docs/admin/algorithm-selection
The fast/slow selection still applies, but if you put up the difficulty, even the fast version will take some time.
edit: Because HN is throwing "you're posting too fast" errors again:
> That falls short of the "meets their needs" test. Authenticated users already have a check (i.e., the auth process). Anubis is to stop/limit bots from reading content.
Arch Wiki is a high value target for scraping so they'll just solve the anubis challenge once a week. It's not going to stop them.
The goal of Anubis isn't to stop them from scraping entirely, but rather to slow down aggressive scraping (e.g. sites with lots of pages being scraped every 6 hours[1]) so that the scraping doesn't impact the backend nearly as much
[1] https://pod.geraspora.de/posts/17342163, which was linked as an example in the original blog post describing the motivation for anubis[2]
[2]: https://xeiaso.net/blog/2025/anubis/
ISTR that Anubis allows the site-owner to control the expiry on the check; if you're still getting hit by bots, turn the check to 5s with a lower "work" effort so that every request will take (say) 2s, and only last for 5s.
(Still might not help though, because that optimises for bots at the expense of humans - a human will only do maybe one actual request every 30 - 200 seconds, while a bot could do a lot in 5s).
An obvious followup is to decrement it by a larger amount if requests are made at a higher frequency.
Yup. Anubis breaks the web. And it requires JavaScript, which also breaks the web. It’s a disaster.
I guess if your cookie expired at just the right time that could cause this issue, and that might be worth thinking about, but I think "breaks the web" is overstating it a bit, at least for the default configuration.
These crawlers are designed to work on 99% of hosts, if you tweak your site just so slightly out of spec, these bots wouldn’t know what to do.
Yes it could be in higher layer than what I suggested indeed, on top of HTTP sounds good to me.
My rule of thumb is that it should work with curl (which makes it not antibots, but just anti scrapper & ddos, which is what I have a problem with)
It's basic separation of responsibilities. It's helpful for reuse but also innovation. For example, the auth scheme baked in to HTTP is pretty much stuck in time and not very useful. We'd likely be better off if it wasn't tightly coupled to something unrelated like that. If I were implementing an HTTP stack I'd want to omit it, but that would make me noncompliant.
I think its a great discussion though that gets to the heart of open source and software freedom and how that can seem orthogonal to business needs depending on how you squint.
Oh hey, that’s a pretty utilitarian stack and I’m happy to see MariaDB be used out there.
Anubis is also really cool, I do imagine that proof of work might become more prevalent in the future to deal with the sheer amount of bots and bad actors (shame that they exist) out there, albeit in the case of hijacked devices it might just slow them down, hopefully to a manageable degree, instead of IP banning them altogether.
I do wonder if we’ll ever see HTTP only versions of PoW too, not just JS based options, though that might need to be a web standard or something.
Amazon, Akamai, Kasada and other big players in the WAF/Antibot industry will charge you millions for the illusion of protection and half-baked javascript fingerprint collectors.
They usually calculate how "legit" your request is based on ambiguous factors, like the vendor name of your GPU (good luck buying flight tickets in a VM) or how anti-aliasing is implemented on you fonts/canvas. Total bullshit. Most web scrapers know how to bypass it. Especially the malicious ones.
But the biggest reason why I'm against these kind of systems is how they support the browser mono-culture. Your UA is from Servo or Ladybird? You're out of luck. That's why the idea choosing a purely browser-agnostic way of "weighting the soul" of a request resonates highly with me. Keep up the good work!
It would be great if there was a standard for that so that all kinds of clients knew how to provide a proof of work, e.g. like this:
Where sha256(abcXYZ) would have to start with at least 5 zeros.Seriously.
The only is issue I can think of is there may be browsers or browser extensions that preload links to show thumbnails and users might be banned without knowing why.
> As an attacker with stupid bots, you’ll never get through. As an attacker with clever bots, you’ll end up exhausting your own resources.
But the attack was clearly from a botnet, so the attacker isn’t paying for the resources consumed. Why don’t the zombie machines just spend the extra couple seconds to solve the PoW (at which point, they would apparently be exempt for a week and would be able to continue the attack)? Is it just that these particular bots were too dumb?
The likely explanation is that the bots are just curling the expensive URLs without a proper JavaScript engine to solve the challenge.
E.g. if I hack a bunch of routers around the world to act as my botnet, I probably wouldn't have enough storage to install Chrome or Selenium. The lightweight solution is just to use curl/wget (which may be pre-installed) or netcat/telnet.
Flat out user-agent blacklist seems really weird, it's going to reward the companies that are more unethical in their scraping practices than the ones who report their user agent truthfully. From the repo it also seems like all the AI crawlers are also DENY, which, again, would reward AI companies that don't disclose their identity in the user agent.
I'm aware that end users can modify the rules, but in reality most will just use the defaults.
And, of course, the link just shows the default behaviour. Website admins can change them to their needs.
I'm sure there will be workarounds (like that version of curl that has its HTTP stack replaced by Chrome's) but things are ever moving forward.
Honest AI scrapers use the information to learn, which increases their value, and the owner of the scraped server has to pay for it, getting nothing back — there's nothing honest about it. Search engines give you visitors, AI spiders only take your money.
And I would argue Anubis does nothing to stop real DDoS attacks that just indiscriminately blast sites with tens of gbps of traffic at once from many different IPs.
We shut down the website/http frontend to our git repo. There are still 20k distinct IP addresses per day hitting up a site that issues NOTHING but 404 errors.
Caching is already enabled, but this doesn’t work for the highly dynamic parts of the site like version history and looking for recent changes.
And yes, it doesn’t work for volumetric attacks with tens of gbps. At this point I don’t think it is a targeted attack, probably a crawler gone really wild. But for this pattern, it simply works.
If you have expensive URLs that you can't serve more than, say 3 of at a time, or 100 of per minute, NOT rate limiting them will end up keeping real users out simply because of the lack of resources.
They wait until your phone is on wifi / battery, then make requests on behalf of whoever has paid the analytics firm for access to 'their' residential IP pool.
INFATICA LTD
Reg. No.: 14863491
Unit A, 82 James Carter Road, Mildenhall, Suffolk, IP28 7DE, United Kingdom
2. The US is currently broken and they are not going to punish only, albeit unsustainable, growth in their economy.
3. Internet is global. Even EU wants to regulate, will they charge big tech leaders and companies with information tech crimes which will pierce the corporate veil? It will ensure that nobody will invest in unsustainable AI growth in the EU. However fucking up economy and the planet is how the world operates now, and without infinite growth you lose buying power for everything. So everybody else will continue to do fuckery.
4. What can a regulating body do? Force disconnects for large swaths of internet? Then Internet is no more.
By far most malware is legal and a portion of its income is used to fund election campaigns.
Volumetric DDoS and application layer DDoS are both real, but volumetric DDoS doesn't have an opportunity for cute pictures. You really just need a big enough inbound connection and then typically drop inbound UDP and/or IP fragments and turn off http/3. If you're lucky, you can convince your upstream to filter out UDP for you, which gives you more effective bandwidth.
If you run a fleet of servers, all doing different things, Apache is a good choice because all the various uses are going to be supported. It might not be the best choice in each individual case, but it is the one that works in all of them.
I don't know why some are so quick to write off Apache. Is just because it's old? It's still something like the second most used webserver in the world.
I started using it when Oracle's Webcache wouldn't support newer certificates and I had to keep Oracle Portal running. I could edit the incoming certificate (I had to snip the header and the footer) and put it in a specific header for Portal to accept it.
Some differences:
- Uses HAProxy (duh)
- Proof of work can be either sha256 or argon2
- Optional recaptcha/hcaptcha in addition to the proof of work
- Includes a script for your page that will re-solve the challenge in the background before the cookie expires
There's also a control panel, dns server, etc. I kinda built my own everything because I refused to use bunny/cloudflare/whatever.
One thing I will say though, is that proof-of-work alone isn't a solution for ddos mitigation and bot protection! I've seen attackers using a mass of proxies and headless browsers to solve the challenge, or even writing code to extract and solve the challenge directly (https://github.com/lizthegrey/tor-fetcher). To adequately protect against more targeted attacks, you need additional acl and heuristics, browser fingerprinting, tls fingerprinting, ip reputation, etc. I do offer the whole thing setup as a commercial service, but will refrain from too much shilling.
It's fun, and I love seeing similar softwares help fight the horde of AI scrapers :^)
>Anubis is provided to the public for free in order to help advance the common good. In return, we ask (but not demand, these are words on the internet, not word of law) that you not remove the Anubis character from your deployment.
>If you want to run an unbranded or white-label version of Anubis, please contact Xe to arrange a contract.
https://anubis.techaro.lol/docs/funding
Hope this is useful to others!
Compare to a take-a-penny-leave-a-penny tray from an era past. You are legally allowed to scoop up all the pennies into a bag, and leave the store, then repeat at the neighboring store, and make a few bucks. You'd be an asshole, but not face legal trouble. You "followed the rules" to the letter. But guess what? If you publish an easy how-to guide with "one weird trick" for making some quick cash, and people start adopting your antisocial behavior and emptying out change trays, you've forced the issue and now either a) businesses will stop offering this convenience or b) the rules around it will be tightened and the utility will be degraded. In the concrete case of Anubis, the maintainers may decide to stop contributing their time to this useful software or place a non-FOSS license on it in an attempt to stop gain-maximizing sociopaths from exploiting their efforts.
I even it out by how I prioritize feature requests, bug reports, and the like :)
I didn't implement this out of fear or some lack of courage. In fact I had the original avatars up for quite a while. I simply wanted my own logo so visitors wouldn't be potentially confused. It seemed to fit the use case and there was no way to achieve what I wanted without reaching out. I didn't feel comfortable bugging you or anybody on account of my tiny little no-traffic git forge even though, yes, that is what you politely asked for (and did not demand).
I think if you do feel this strongly you might consider changing the software's license or the phrasing of the request in the documentation. Or perhaps making it very clear that no matter how small, you want to be reached out to for the whitelabel version.
I think the success story of Anubis has been awesome to read about and follow and seeing how things unfold further will be fun to watch and possibly even contribute to. I'm personally rooting for you and your project!
Your analogy to me seems imprecise, as analogies tend to be when it comes to digital goods. I'm not taking pennies in any sense here, preventing the next person from making use of some public good.
You can make a similar argument for piracy or open source, and yet... Here we all still are and open source has won for the most part.
The GPL protects users from any restrictions the author wants to use. No additional restrictions are allowed, whether technical or legal.
In this case, the restriction is social, but is a restriction nonetheless (some enforce it by harassment, some by making you feel bad).
But you could ignore it, even fork it and create a white label version, and be proud of it (thereby bypassing the restriction). Donate voluntarily if you want to contribute, without being restricted technically, legally, or socially.
Some project even took it to the next level and displayed a furry porn. I think anime and furry graphics are related, esp. in the weird obsession of the people to shove it to the unsuspecting people, but since it's "cute" it's passable. Well unless it gets into the porn territory.
On the other hand I applaud the author for an interesting variation of making the free product slightly degraded so people are incentived to donate money. The power of defaults and their misuse.
Personally I'm not fan of enshittification of any kind even a slight one even when it's to my own detriment.
Except the author is not shoving any stuff at you. Author doesn't owe anything to you and can do whatever they want and you doesn't owe the author the obligation to use their software.
It's not business, it's a person giving something free to the world and asking people who uses it to play the game. You can chose to not play the game or to not use it, but you can't act like your issue with an anime character is the author's fault. Just don't install it on your server and go ahead.
Are you sure you have the right pronouns for Xe?
This is your weird association and hang-up. That's on you to deal with, not Anubis or the rest of the internet.
Really though my dayjob kinda burns me out because I have to focus on AEO, which is SEO but for AI. I get by making and writing about cool things, but damn does it hurt having to write for machines instead of humans.
No comments yet
The code is open source, so I can’t imagine making a fork to remove that is a Herculean effort.
> Regardless, Xe did ask nicely to not change out the images shipped as a whitelabel service is planned in the future
https://github.com/TecharoHQ/anubis/pull/204#issuecomment-27...
That feels uncomfortably close to returning to the privacy-and-CGNAT-hating embrace of cloudflare et al.
No comments yet
Anywhere I can read more about this? Sounds super interesting, and a cursory search didn’t show anything for it on your site.
Otherwise I’m sure I’ll hear about it soon anyway, at the rate Anubis is going!
I am also working on some noJS checks, but I want to test them with paid customers in order to let a thousand flowers bloom.
https://github.com/crowdsecurity/crowdsec
Of course, if you use this service for your enterprise, the Right Thing To Do would be support the excellent project financially, but this is by no means required.
If you want to use this project on your site and don’t like the logo, you are free to change it. If the site is personal and this project is not something you would spend money on, I don’t even think it is unethical to change the image.
Note that I’m not faulting you for behaving this way, no insult or disparagement intended, etc.! Open source inherited this dissonance between giving it all away to anyone who asks for free, and giving nothing of yours back in return because prosocial is not an ethical standard, from its predecessor belief system. It remains unsolved decades later, in both open source and libertarianism, and I certainly don’t hold generic exploiters of the prosocial-imbalance defect accountable for the underlying flaw in both belief systems.
I’m trying to imagine how this might be unethical. The only scenario I can think of is if the authors wanted the code to not be modified in certain ways, but felt based on more deeply held principles that the code should be made FOSS. But I struggle to see how both ideas could exist simultaneously - if you think code should be free then you think there is no ethical issue with people modifying it to fit their use.
If you believe in giving away code because that’s open-source prosocial, then open-source adherents will claim that taking advantage of you is ethical, because if you didn’t want to be exploited, you shouldn’t have been open-source prosocial in the first place. And by treating “pay me if you get paid for my code” licenses as treated as evil and shameful, exploiters place pressures on prosocial maintainers into adopting open source licenses, even though they’ll then be exploited by people who don’t care about being prosocial, eventually burning out the maintainer who either silent-quits or rage-quits.
Of course, if OSI signed off on “if you get rich from my source code you have to share some of that wealth back to me” as a permissible form of clause in open source licensing, that would of course break the maintainer burnout cycle — but I’m certainly not holding my breath.
But I do agree that this is the crux of the issue.
Blatantly untrue. Companies riding the coattails of the opensource moniker for PR points while using restrictive licenses is what garners all the hate. It's essentially fraud committed to garner good press.
The other thing that gets people riled up is companies with a CLA that they claim is for responsible stewardship suddenly pulling a fast one and relicensing the project to a non-OSI license. It's perfectly legal but it tends to upset people.
There's absolutely nothing wrong with source available software at any level of restriction. Just be very clear about what it is and isn't.
Sure, you can say it’s unethical in that it directly contravenes their request - I won’t argue that - but it’s the smallest of violations.
As far as I can see it’s MIT licensed so you have no legal obligation otherwise. If they truly cared about people keeping the character, they should have made the request with teeth.
I don’t even understand why they made the request in the first place. The nature of the request makes it seem as though it isn’t actually important at all, so why make the request at all? It just puts everyone else in an uncomfortable position. If keeping the character is important, then why release it under MIT license?
See also: “Npm should remove the default license from new packages” https://news.ycombinator.com/item?id=43864518
Such a license does not comply with your requirements; yet, it is also valid under case law, even if it is statistically unlikely to permit enforcement against most claimed evils. Each society has certain evils that are widely accepted by the courts, so it certainly isn’t a get out of all possible jails free card.
The purpose of a license is to inform of the rights available. The user is responsible for evaluating the license, or for trusting the judgment of a third party if they are uninterested in evaluating themselves.
If the author’s entire license “This is free software for free uses, please contact me for a paid license for paid uses” then that is statistically likely to be court enforceable against exploitation, so long as the terms offered are reasonable to the judge and any expert witnesses called. The Free Software Foundation does not have exclusive rights to the words “free software”. Adoption will be much reduced for someone who writes such a license, of course, and perhaps someone will exploit a loophole that a lengthier outsourced license would have covered. Neither of those outcomes are necessarily worth the time and effort to try and prevent, especially when use of any open source license guarantees the right of exploitation for unshared profit in plain language versus the homegrown one which does not.
(I am not your lawyer, this is not legal advice.)
Using a license that allows the software to be distributed and modified, while placing restrictions or exemptions to those permissions outside of the license, at the very least sends mixed signals. My point is that if the author wants to make those restrictions, that's fine, but the license is the correct place for it. What's shitty from my moral perspective is using a commonly accepted free software license for marketing purposes, but then judging people for not following some arbitrary demands. If anything, _that_ is the unethical behavior.
"we ask (but not demand, these are words on the internet, not word of law) that you not remove the Anubis character from your deployment"
For whatever reason somebody decided to blow it out of proportion here on hn.
You're ignoring the possibility that users of the software might not agree with the author's wishes. There's nothing unethical about that.
A request to not change a part of the software is the same as a request to not use the software in specific industries, or for a specific purpose. There are many projects that latch on open source for brand recognition, but then "forbid" the software to be used in certain countries, by military agencies, etc. If the author wants to restrict how the software can be used, then it's not libre software.
I don't believe it is possible to reconcile these ethical views, as a ethical subjectivist.
Edit to add, an example of a non-contradictory request might be to contribute monetarily in proportion to the financial benefit you derive. It's an additional non-binding request to help sustain the community which seems reasonably consistent with the ethos of opensource to me.
The issue is that opensource is a movement that comes with a set of values attached. The licenses aren't impersonal the way the copyright system at large is.
> Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software
I disagree.
Licenses that prohibit exploitation of source code for personal reward are treated with hostility, shame, and boycotts — claiming that to restrict in any way the liberty of another person to exploit one’s work is unethical. Human beings are social creatures, and most human beings are not asocial with decoupled ethical systems like myself; so, given the social pressures in play, few human beings truly have the liberty to pick another license and endure the shame and vitriol that exercising that freedom earns from us.
Since the original subject is also about swapping out the imagery, it's also difficult to take your argument too seriously as the term "exploit" is doing a lot of heavy lifting for your argument.
I will also add that the social and ethical component goes both ways: is it ethical to knowingly give something away freely and without restriction and then immediately attempt to impose restrictions through a purely social mechanism? I would say so as long as your expectation is that some might politely decline.
Or worse, some may respond with the same vitriol and then we're at your original point, which doesn't seem to be preventing such an approach here, making me doubt your hypothesis.
I'd have to disagree. However let's just run with it because your subsequent reasoning doesn't seem consistent to me.
If you do A you'll be met with hostility. So instead you do B, but then you add a request "actually please abide by A" and somehow this is supposed to not be met with hostility? You can't have it both ways. B but with an addendum that makes it A is just A wearing a mask. Changing the name doesn't change the thing.
I'm seeing this sentiment multiple times on this thread - "fine, it's legal, but it's still wrong!"
That's an extremely disrespectful take on someone adhering to a contract that both parties agreed to. You are using shaming language to pressure people into following your own belief system.
In this specific instance, the author could have chosen any damn license they wanted to. They didn't. They chose one to get the most adoption.
You appear to want both:
1. Widespread adoption
and
2. Restrict what others can do.
The MIT license is not compatible with #2 above. You can ask nicely, but if you don't get what you want you don't get to jump on a fucking high horse and religiously judge others using your own belief system.
Author should have used GPL (so any replaced images get upstreamed back and thus he has control) OR some other proprietary license that prevents modifications like changing the image.
A bunch of finger-pointers gabbing on forums about those "evil" people who stick to both the word and the spirit of the license are nothing more than the modern day equivalent of witch-hunters using "intent" to secure a prosecution.
Be better than that - don't join the mob in pointing out witches. We don't need more puritans.
In this case upstreaming replaced images wouldn't be useful to the author anyway, they are going to keep the anime image.
In this case, it would be, because (presumably) the new images are the property of the user, and they would hardly want (for example) their company logo to be accidentally GPL'ed.
For example, if an employee does something hostile towards society at their employer when they have the freedom to choose not to do so — and since employment is at will, they always have that freedom to choose — I will tend to judge their antisocial actions unethical, even if their contract allows it. (This doesn’t mean I will therefore judge the person as unethical! One instance does not a pattern make, etc.)
So, for me, ethical judgments are not opt-out under any circumstance, nor can they be abrogated by contract or employment or law. I hold this is a non-negotiable position, so I will withdraw here; you’re welcome to continue persuading others if you wish.
I didn't claim it does, I am claiming that since ethics is subjective and the contract is not, you subjecting your moral standard to others is no different than a mob subjecting an old woman to accusations of being a witch.
Now, you may not have a problem publicly judging others, but your actions are barely different from those of the Westboro Baptist Church.
IOW, sure, you are allowed to publicly condemn people who hold different moral beliefs to you, but the optics are not good for you.
"no different than a mob subjecting an old woman to accusations of being a witch."
Well, you're not being driven out of your village or being executed... Also the person you're replying to has beeing rather polite. Hardly a witch hunt is it?
"barely different from those of the Westboro Baptist Church"
The church that interrupts the grieving of the families of dead soldiers to shout about how much they hate gay people? You seriously believe that the person you're repling to is "barely different" from that?
"IOW, sure, you are allowed to publicly condemn people who hold different moral beliefs to you, but the optics are not good for you."
You're literally condeming them for having different moral beliefs than you right now, while being much more accusatory about it, comparing them to some really vile people. I wonder how you feel the optics of this reflects on you, because I don't think it's good for you.
Why are you so offended that someone might judge you for ignoring the friendly request of someone giving you something for free?
https://git.kernel.org/ changed theirs
It should explain it isn't mining and just verifying the browser or such.
I'm guessing folks have seen enough captcha and CloudFlare verification pages to get a sense that they're being "soul" checked and that it's not an issue usability-wise.