Now you can watch the Internet Archive preserve documents in real time

168 LorenDB 10 5/23/2025, 12:14:22 PM theverge.com ↗

Comments (10)

ignoramous · 31d ago
I am part of an informal group involved in actively archiving websites, and the ones behind Cloudflare Captchas are barely archive-able. I presumed Cloudflare had a deal with Archive.org but I guess it went no where? https://blog.cloudflare.com/cloudflares-always-online-and-th...
sadeshmukh · 31d ago
It's still a setting in their dashboard, but the site owner has to manually enable Always Online.
charcircuit · 31d ago
Are you using ios or macos to have access to private access tokens?

https://blog.cloudflare.com/eliminating-captchas-on-iphones-...

qingcharles · 31d ago
This looks like a useful solution for scraping. It doesn't prove you're a human, simply that you can afford to buy an iPhone. So buy the cheapest iPhone that supports this on eBay and then use that for scraping and archiving from now on.
lxgr · 31d ago
Given that these tokens are intentionally designed to distinguish human from bot traffic, I'd be surprised if they were (easily) available to archival tooling.
charcircuit · 31d ago
The URLSession API supports private access tokens (it's handled for you automatically) while your app is foregrounded.

https://developer.apple.com/documentation/foundation/urlsess...

lxgr · 31d ago
Oh, interesting! But I'd still expect these to be heavily rate limited etc. – otherwise, the people captcha-protected sites are hoping to keep out could just use these, right?
charcircuit · 31d ago
At what rate are archivers solving Cloudflare challenges though? Probably not enough to hit any kind of rate limit. This is only used for the initial challenge and not for every request.
mellosouls · 30d ago
Plenty of other archives around the world; one would hope any impediments to them doing their job due to Cloudflare would have a more general solution than a single partner.
neom · 33d ago