I researched every attempt to stop fascism in history. The success rate is 0% (cmarmitage.substack.com)

I have Chrome on mobile configured as such that JS and cookies are disabled by default, and then I enable them per site based on my judgement. You might be surprised to learn that normally, this actually works fine, and sites are usually better for it. They stop nagging, and load faster. This makes some sense in retrospect, as this is what allows search engine crawlers to do their thing and get that SEO score going.

Anubis (and Cloudflare for that matter) force me to temporarily enable JS and cookies at least once however anyways, completely defeating the purpose of my paranoid settings. I basically never bother to, but I do admit it is annoying. It's kind of up there with sites that don't have any content by default, only with JS on (high profile example: AWS docs). At least Cloudflare only spoils the fun every now and then. With Anubis, it's always.

It's definitely my fault, but at the same time, I don't feel this is right. Simple static pages now require allowing arbitrary code execution and statefulness. (Although I do recognize that SVGs and fonts also kind of do so anyhow, much to my further annoyance).

PaulHoule · 47m ago

It seems like a whole lot of crap to me. Hostile webcrawlers, not to mention Google, frequently run Javascript these days.

Where I work our main product is a React-based web site with a JSON back end, you might go to

http://example.com/web/item/88841

and that will load maybe 20MB of stuff (always the same thing) and eventually after the JS boots up a useEffect() gets called that reads '88841' out of the URL and does a GET to

http://example.com/api/item/88841

which gets you nicely formatted JSON. On top of that the public id(s) are sequential integers so you could easily enumerate all the items if you just thought a little bit.

We've had more than one obnoxious crawler that we had reason to believe was targeted specifically at us that would go to the /web/ URL and, without a cache, download all the HTML, Javascript, CSS, then run the JS and download the JSON for each page -- at which case they are either saving the generated HTML or looking at the DOM. If they'd spent 10 minutes playing with the browser dev tools they would have seen the /item/ request and probably could have figured out pretty quickly out how to interpret the results. As is they're going to have to figure out how to parse that HTML and turn it into something like the JSON and could probably save them 95% of the bandwidth, 95% of the CPU, and whatever time they spent writing parsing code and managing their Rube Goldberg machine but I'd take 50% odds any day that they never actually did anything with the data they captured because crawlers usually don't.

I know because I've done more than my share of web crawling and I have crawlers that: capture plain http data, can run Javascript in a limited way, and can run React apps. The last one would blast right past Anubis without any trouble except for the rate limiting which is not a lot of problem because when I crawl I hit fast, I hit hard, and I crawl once. [1] (There's a running gag in my pod that I can't visit the state of Delaware because of my webcrawling)

[1] Ok, sometimes the way you avoid trouble is hit slow, hit soft, but still hit once. It's a judgement call if you can hit them before they knew what hit them or if you can blend in with the rest of the traffic.

bee_rider · 39m ago

I just bounce off those sites most of the time. Whatever, there’s still a lot of open internet.

altairprime · 1h ago

We have nothing to protect sites against scrapers except to make it more expensive for everyone’s, unless privacy-compromising or authority-trusting methods are on the table.

Making you pay time, power, bandwidth, or money to access content does not significantly impede your browsing, so long as the cost is appropriately small. For the user above reporting thirty seconds of maxcpu, that’s excessive for a median normal person (but us hackers are not that).

If giving your unique burned-in crypto-attested device ID is acceptable, there’s an entire standard for that, and when your device is found to misbehave, your device can be banned. Nintendo, Sony, Xbox call this a “console ban”; it’s quite effective because it’s stunningly expensive to replace a device.

If submitting proof of citizenship through whatever attestation protocol is palatable is okay, the Anubis could simply add the digital ID web standard and let users skip the proof of work in exchange for affirming that they have a valid digital ID. But this only works if your specific identity can be banned, or else AI crawlers will just send a valid anonymized digital ID header.

This problem repeats in every suggested outcome: either you make it more difficult for users to access a site, or you require users to waste energy to access a site, or you require identifiable information signed by a dependable third-party authority to be presented such that a ban is possible based on it. IP addresses don’t satisfy this; Apple IDs, trusted-vendor HSM-protected device identifiers, and digital passports do satisfy this.

If you have a solution that only presents barriers to excessive use and allows abusive traffic to be revoked without depending on IP address, browser fingerprint, or paid/state credentials, then you can make billions of dollars in twelve months.

Ideas welcome! This has been a problem since bots started scraping RSS feeds and republishing them as SEO blogs, and we still don’t have a solution besides Cloudflare and/or CPU-burning interstitials.

(ps. I do have a solution for this, but it would require physical builds, be mildly unprofitable over time with no growth potential, and incite governments hostility towards privacy-preserving identity systems. A billionaire philanthropist could build it in a year and completely solve this problem. Sigh.)

perching_aix · 27m ago

I actually do not have a problem with digital IDs, as long as my personal identity isn't being shared alongside it. Not to the site operator, not to the government.

This might seem contradictory, but I believe this is technically possible? What I don't think is this is how these solutions actually work currently. Like to basically prove that I am indeed a unique visitor who's a person according to the govt, but wouldn't reveal the person info to the site, and wouldn't reveal the site info to the govt, even if they collude.

Same with the whole +18 goof. I'd actually quite like to try age gated communities, like +-5 years my age. I feel a lot of conflict stems from people coming from a bit too different walks of life sometimes. Could even do high confidence location based gating this way, which could also be cool (as well as the exact opposite of cool, because of course).

altairprime · 5m ago

[delayed]

fluoridation · 6m ago

Assuming a person can only have a single ID, how would that be enforced without a unique party having a 1-to-1 mapping between person and ID?

PaulDavisThe1st · 27m ago

> so long as the cost is appropriately small.

there are different metrics for cost, however. Based on cpu utilization and/or time, it's hard to argue that Anubis is a high price.

But if it is important to you to not run javascript for whatever reason, the price of access to a site using Anubis is rather high.

tptacek · 45m ago

We very definitely do have stuff to protect sites that don't make it more expensive for everyone! Just none of it is open source.

necubi · 38m ago

And, fundamentally it can't be opensource. Bot detection (like anti-fraud more generally) is an adversarial game that relies on hidden techniques. Open-sourcing it means you lose that advantage and make life much easier for anyone trying to get around it.

tptacek · 35m ago

I think there's probably a platform for it that you can open source --- the virtual machine, or the core of the virtual machine or something, but yeah, you're right, this is something Anubis will have to contend with long term; the effective solutions for this all benefit from obscurity.

PaulDavisThe1st · 26m ago

There's zero reason it cannot be opensource. Proof-of-XXXXX schemes do not rely on obscurity to be functional.

tptacek · 16m ago

The schemes large players use to increase the cost of e.g. creating new accounts on their services do in fact rely on obscurity. They target developer cost, not compute cost.

Joker_vD · 2h ago

I think I actually saw a question on SO way back during the Windows Vista era when some guy asked if Windows supported machines with odd number of cores/processors, and the answer was "well, 1 is an odd number, you know".

Another joke from the same era: Having a 2 core processor means that you can now e.g. watch a film at the same time. At the same time with what? At the same time with running Windows Vista!

creatonez · 1h ago

Sure, but 1 is also a power of 2:

2^0 = 1

So the logic might make sense in people's heads if they never encounter 6 or 12 core CPUs that are common these days.

MindSpunk · 1h ago

Even long ago we had the AMD Phenom X3 chips which were 3 cores.

jsheard · 1h ago

The fun thing about those is they were physically quad cores with one core disabled, which may or may not have been defective, so if you were lucky you could unlock it and get a bonus core for free.

hinkley · 39m ago

Binning made the world weird.

neurostimulant · 51m ago

> In retrospect implementing the proof of work challenge may have been a mistake and it's likely to be supplanted by things like Proof of React or other methods that have yet to be developed.

> ... a challenge method that requires the client to do a single round of SHA-256 hashing deeply nested into a Preact hook in order to prove that the client is running JavaScript.

Why a single round? Doing the whole proof of work challenge inside the proof of react would be even more effective, right?

DiabloD3 · 2h ago

Wait, the Anubis people _didn't know_ 3 core machines were sold for years? AMD was famous for it!

nerdsniper · 1h ago

In their testing, even with odd numbers of physical cores, SMT caused an even number of logical cores. Some phones didn't have SMT, and also had an odd number of physical cores, but this was genuinely rare.

Also, they still might not (but probably learned). In this article they imply that each type of CPU core (what they call a "tier" in the article) will still be a power of two, and one just happened to be 2^0. I'm not sure they were around when the AMD Athlon II X3 was hot.

>>> Today I learned this was possible. This was a total "today I learned" moment. I didn't actually think that hardware vendors shipped processors with an odd number of cores, however if you look at the core geometry of the Pixel 8 Pro, it has three tiers of processor cores. I guess every assumption that developers have about CPU design is probably wrong.

jeffbee · 37m ago

> each type of CPU core (what they call a "tier" in the article) will still be a power of two

Yeah that's obviously not true, and believing it shows a marked lack of experience in the field. Of the current Xeon workstation lineup, only 3 of 14 SKUs have power-of-2 core counts. And there are consumer lines of CPUs with 6 cores and that sort of thing.

PaulDavisThe1st · 25m ago

I believe that the assumption was multiple of two, not power of two.

Sesse__ · 2h ago

What about... single-core machines?

john-h-k · 1h ago

The line of code in the article is `Math.max(nproc / 2, 1)`. So 1 core yields 1 thread. Only CPUs with an odd number of cores, no SMT, and >1 core will hit this bug. Not very common

jsheard · 1h ago

In theory a CPU with SMT could still trigger this bug, because not every core necessarily has to have SMT. Intel made some chips that combined performance cores with SMT and efficiency cores without SMT, so if they had an odd number of E-cores they'd have ended up with an odd number of threads regardless.

jeffbee · 23m ago

You can also just boot linux with maxcpus=5 or any other number. Believing things about the parity of the number of CPUs is just nuts.

nerdsniper · 1h ago

SMT generally caused single-core CPU's to appear as 2 logical cores.

I realize Anubis was probably never tested on a true single-core machine. They are actually somewhat difficult to find these days outside of microcontrollers.

crote · 48m ago

Even in microcontrollers it is starting to become increasingly rare! We've progressed to a point where sub-$1 hobbyist chips like the RP2040 are multicore these days.

ChocolateGod · 40m ago

I have a S24+ and Anubis often runs poorly for me and fails. I tend to frequent tech related sites so browsing on my phone has been miserable the last couple months.

I checked the value of navigator.hardwareConcurrency on my phone and it returns 9... I guess that explains it.

It looks like setting light performance mode in device optimisations (I don't game on my phone) turns off the S24s sole Cortex-X4.

hinkley · 40m ago

Sometimes cores are fractional. Particularly thanks to Docker. I’m currently trying to get this fixed in several NodeJS situations.

Filligree · 1h ago

Ironically, this sat on the intermission page for a good half-minute while my fans spun up. Then I gave up; it was eating the battery.

ddulaney · 1h ago

Can I ask what hardware you’re using? I’ve heard similar things on the internet generally, but I’m on a several-years-old phone and it took under a second. Is the interstitial really that slow on some setups?

Filligree · 1h ago

I do a lot of random browsing on an old iPad. Which doesn't have fans, I know, that was short for "it got really hot".

I'm not sure what generation it is, but I bought it around a decade ago I think.

neurostimulant · 35m ago

Old browsers without crypto support would fall back to pure js sha256 implementation, which I imagine would be slow on an old iPad.

yjftsjthsd-h · 54m ago

TIL the CPU count is exposed to JS. I guess that's fine? It feels nasty, but it's not really worse than all the other fingerprinting data we expose...

gck1 · 28m ago

Also fonts you have installed, the type of connection you're using, GPU parameters, keyboard languages on your system and so much more [1]

[1] https://abrahamjuliot.github.io/creepjs/

hinkley · 35m ago

It’s also frequently wrong when running in Docker. Some of that is libuv’s fault, some of it is cgroups deciding not to mask off /proc values that are wrong in the cgroup.

dmitrygr · 1h ago

> I guess every assumption that developers have about CPU design is probably wrong.

Javascripters, perhaps. Those who work on schedulers, or kernels in general would find this completely normal

ranger_danger · 2h ago

> In retrospect implementing the proof of work challenge may have been a mistake

Why?

What would the alternative have been?

tptacek · 40m ago

Without getting into the alternatives: scraper defense isn't a viable proof of work setting, because there's no asymmetry to exploit. You're imposing exactly the same cost on legit users as you are on scrapers. Economies of scale mean that the marginal cost for your adversary is actually significantly lower than for your real users.

What the Anubis POW system is doing right now is exploiting the fact that there's been no need for crawlers to be anything but naive. But the cost to make them sophisticated enough to defeat the POW system is quite low, and when that happens, the POW will just be annoying legit users for no benefit.

I don't know if "mistake" is the word I'd use for it. It's not a whole lot of code! It's a reasonable first step to force crawlers to emulate a tiny fraction of a real browser. But as it evolves, it should evolve away from burning compute, because that's playing to lose.

tux3 · 2h ago

It does two things: Force everyone (including scrapers) to run a real JS engine, and force everyone to solve the challenge.

The first effect is great, because it's a lot more annoying to bring up a full browser environment in your scraper than just run a curl command.

But the actual proof of work only takes about 10ms on a server in native code, while it can take multiple seconds on a low-end phone. Given the companies in questions are building entire data centers to house all their GPUs, an extra 10ms per web-page is not a problem for them. They're going to spend orders of magnitude more compute actually training on the content they scraped, than solving the challenge.

It's mostly the inconvenience of adapting to Anubis's JS requirements that held them back for a while, but the PoW difficulty mostly slowed down real users.

jsnell · 2h ago

An unavoidable aspect of abuse problems is that there is no perfect solution. As the defender, you’re always making a precision vs. recall tradeoff. After you’ve picked off the low hanging fruit, most of the time the only way to increase recall (i.e. catch more abuse) is by reducing the precision (i.e. having more false positives, where a good user is falsely considered an abuser).

In an adversarial engineering domain neither the problems or solutions are static. If by some miracle you have a perfect solution at one point in time, the adversaries will quickly adapt, and your solution stops being perfect.

So you’ll mostly be playing the game in this shifting gray area of maybe legit, maybe abusive cases. Since you can’t perfectly classify them (if you could, they wouldn’t be in the gray area), the options are basically to either block all of them, allow all of them, or issue them a challenge that the user must pass to be allowed. The first two options tend to be unacceptable in the gray area, so issuing a challenge that the client must pass is usually the preferred option.

A good counter-abuse challenge is something that has at least one of the following properties:

1. It costs more to pass than the economic value that the adversary can extract from the service, but not so much that the legitimate users won’t be willing to pay it.

2. It proves control of a scarce resource without necessarily having to spend that resource, but at least in such a way that the same scarce resource can’t be used to pass unlimited challenges.

3. It produces additional signals that can be used to meaningfully improve the precision/recall tradeoff.

And proof of work does none of those. The last two by construction, since compute is about the most fungible resource in the world. The last doesn't work since it's impossible to balance the difficulty factor such that it imposes a cost the attacker would notice but would be acceptable to the defender.

If you add 10s to the latency for your worst-case real users (already too long), it'll cost about $0.01/1k solves. That's not a deterrent to any kind of abuse.

So proof of work just is a really bad fit for this specific use case. The only advantage is that it is easy to implement, but that's a very short term benefit.

zetanor · 2h ago

In practice, any automated work that a real user is willing to wait through will be trivial to accomplish for an organization which scrapes the entire Internet. The real weight behind Anubis is the Javascript gate, not the PoW. It might as well just fetch() into browser.cookies.set().

MBCook · 2h ago

They also suggest maybe “proof of React” would be better with a link to this rough proof of concept:

https://github.com/TecharoHQ/anubis/pull/1038

Could someone explain how this would help stop scrapers? If you’re just running the page JS wouldn’t this run too and let you through?

fluoridation · 2h ago

Low-effort scrapers don't run JS, they just fetch static content.

MBCook · 1h ago

But then they couldn’t get past the current Anubis. Sonia the idea it would just be cheaper for clients?

fluoridation · 1h ago

That's the idea. Impose software requirements on the client instead of computational requirements.

ranger_danger · 2h ago

They admitted that this was a 'shitpost'.

> how this would help stop scrapers

I think anubis bases its purpose on some flawed assumptions:

- that most scrapers aren't headless browsers

- that they don't have access to millions of different IPs across the world from big/shady proxy companies

- that this can help with a real network-level DDoS

- that scrapers will give up if the requests become 'too expensive'

- that they aren't contributing to warming the planet

I'm sure there does exist some older bots that are not smart and don't use headless browsers, but especially with newer tech/AI crawlers/etc., I don't think this is a realistic majority assumption anymore.

alright2565 · 2h ago

In part because this particular proof of work is absolutely trivial at scale, with commercial hardware able to do 390TH/s, while your typical phone would only be able to do a million and still have acceptable latency.

OpenAI Realtime API connected to a Map (It is good at geography) (twitter.com)

Microsoft's Copilot AI is now inside Samsung TVs and monitors (theverge.com)

When Your Cache Has a Bigger Carbon Footprint Than Your Users (robbyonrails.com)

Why Lenders Are Building AI Agents Now [video] (youtube.com)

Yes, I *Would* Sacrifice Myself For 10^100 Shrimp (kylestar.net)

Heist – Viral by Design (krishinasnani.substack.com)

Show HN: CleanerAudio – AI to clean and transcribe audio

Claude Sonnet Will Ship in Xcode (developer.apple.com)

Show HN: A natural language search interface to ETF (and deep insights on each) (signalbloom.ai)

Brickyard (justlaybrick.com)

Ask HN: What to do when you suspect your interview is with a state operative?

Gravity Defied mobile game rewrite from Java to C++ & SDL (github.com)

FBI cyber cop: Salt Typhoon pwned 'nearly every American' (theregister.com)

Project MiniNAS (jadarma.github.io)

AMD MI300X for LLM Serving Disaggregating Prefill and Decode with SGLang (rocm.blogs.amd.com)

I researched every attempt to stop fascism in history. The success rate is 0% (cmarmitage.substack.com)

AWS X-Ray SDK and daemon end of support timeline (docs.aws.amazon.com)

AUR Repository Still Under DDoS Attack (linux-magazine.com)

From Airbnb to America's 'Chief Design Officer' (nytimes.com)

How AI stem separation alters the workflow of music producers (medium.com)

Military spending and war (nber.org)

Ford and the Birth of the Model T (construction-physics.com)

Increased autonomic activation in vicarious embarrassment (2012) [pdf] (pubmed.ncbi.nlm.nih.gov)

Two Chinese Nationals Arrested for Allegedly Illegal Shipping AI Chips to China (justice.gov)

'Universal' Cancer Vaccine Destroys Resistant Tumors in Mice (sciencealert.com)

BCHS Stack: BSD, C, httpd, SQLite (learnbchs.org)

Ask HN: How to teach a 4 year old to code?

F1 in Hungary: Strategy and fast tire changes make all the difference (arstechnica.com)

Bad Craziness (math.columbia.edu)

The Rise of Computer Use and Agentic Coworkers (a16z.com)

We Oops-Proofed Infrastructure Deletion on Railway (blog.railway.com)

From the 'Banter Bill' to Bias Hotlines: The Alarming Rise of Snitch Networks (thedailyeconomy.org)

The latest Covid vaccines come with restrictions (npr.org)

The big idea: Turn lobbying into a high-stakes financial market (nodumbideas.com)

Mainframe upgrade done with wire cutters (2015) (alt.folklore.computers.narkive.com)

Python: The Documentary (lwn.net)

Show HN: DeepShot – an open-source NBA predictor with ML, EWMA, and live UI (github.com)

ExxonMobil Global Outlook: Our view to 2050 (corporate.exxonmobil.com)

Show HN: oLLM – LLM Inference for large-context tasks on consumer GPUs (github.com)

Should We Anthropomorphize LLMs? (aethermug.com)

Prompt Engineering for Grok Code Fast 1 (docs.x.ai)

All Revenue Is Not Created Equal (2011) (abovethecrowd.com)

Ask HN: How much better can the LLMs become assuming no AGI

The A.I. Talent Wars (nytimes.com)

Garmin Blaze Equine Wellness System (garmin.com)

RSS Is Awesome (evanverma.com)

Why HyperCard Had to Die (2011) (loper-os.org)

Hygiene Hypothesis (en.wikipedia.org)

The ABC Programming Language (homepages.cwi.nl)

The Electro-Industrial Stack Will Move the World (a16z.com)

Sometimes CPU cores are odd – Anubis

Comments (49)

Yes, I Would Sacrifice Myself For 10^100 Shrimp (kylestar.net)