I am part of an informal group involved in actively archiving websites, and the ones behind Cloudflare Captchas are barely archive-able. I presumed Cloudflare had a deal with Archive.org but I guess it went no where? https://blog.cloudflare.com/cloudflares-always-online-and-th...
sadeshmukh · 30d ago
It's still a setting in their dashboard, but the site owner has to manually enable Always Online.
charcircuit · 30d ago
Are you using ios or macos to have access to private access tokens?
This looks like a useful solution for scraping. It doesn't prove you're a human, simply that you can afford to buy an iPhone. So buy the cheapest iPhone that supports this on eBay and then use that for scraping and archiving from now on.
lxgr · 30d ago
Given that these tokens are intentionally designed to distinguish human from bot traffic, I'd be surprised if they were (easily) available to archival tooling.
charcircuit · 30d ago
The URLSession API supports private access tokens (it's handled for you automatically) while your app is foregrounded.
Oh, interesting! But I'd still expect these to be heavily rate limited etc. – otherwise, the people captcha-protected sites are hoping to keep out could just use these, right?
charcircuit · 30d ago
At what rate are archivers solving Cloudflare challenges though? Probably not enough to hit any kind of rate limit. This is only used for the initial challenge and not for every request.
mellosouls · 29d ago
Plenty of other archives around the world; one would hope any impediments to them doing their job due to Cloudflare would have a more general solution than a single partner.
https://blog.cloudflare.com/eliminating-captchas-on-iphones-...
https://developer.apple.com/documentation/foundation/urlsess...