HN Search isn't ingesting new data since Friday (github.com)
8 points by busymom0 11m ago 2 comments
A Visual Exploration of Gaussian Processes (2019) (distill.pub)
6 points by vinhnx 1d ago 0 comments
ArchiveTeam has finished archiving all goo.gl short links
134 pentagrama 34 8/17/2025, 5:46:04 PM tracker.archiveteam.org ↗
Since the servers were mine, I could see what was happening, and I was very impressed. Within I want to say two minutes, the instances had been fully provisioned and were actively archiving videos as fast as was possible, fully saturating the connection, with each instance knowing to only grab videos the other instances had not already gotten. Basically they have always struck me as not only having a solid mission, but also being ultra-efficient in how they carry it out.
Edit: Like they kinda seem like an unnecessary middle-man between the archive and archivee, but maybe I'm missing something.
This is in contrast to the Wayback Machine's builtin crawler, which is just a broad spectrum internet crawler without any specific rules, prioritizations, or supplementary link lists.
For example, one ArchiveTeam project had the goal to save as many obscure Wikis as possible, using the MediaWiki export feature rather than just grabbing page contents directly. This came in handy for thousands of wikis that were affected by Miraheze's disk failure and happened to have backups created by this project. Thanks to the domain-specific technique, the backups were high-fidelity enough that many users could immediately restart their wiki on another provider as if nothing happened.
They also try to "graze the rate limit" when a website announces a shutdown date and there isn't enough time to capture everything. They actively monitor for error responses and adjust the archiving rate accordingly, to get as much as possible as fast as possible, hopefully without crashing the backend or inadvertently archiving a bunch of useless error messages.
They are the middlemen that collects the data to be archived.
In this example the archivee (goo.gl/Alphabet) is simply shutting the service down and has no interest in archiving it. Archive.org is willing to host the data, but only if somebody brings it to them. Archiveteam writes and organises crawlers to collect the data and send it to Archive.org
(Source: ran a Warrior)
If Internet Archive is a library, ArchiveTeam is people who run around collecting stuff, and gives it to the library for safe keeping. Stuff that are estimated/announced to be disappearing/removed soon tends to be focused too.
The list of short links and their target URLs can't be 91 TiB in size can it? Does anyone know how this works?
Enlisting in the Fight Against Link Rot - https://news.ycombinator.com/item?id=44877021 - Aug 2025 (107 comments)
Google shifts goo.gl policy: Inactive links deactivated, active links preserved - https://news.ycombinator.com/item?id=44759918 - Aug 2025 (190 comments)
Google's shortened goo.gl links will stop working next month - https://news.ycombinator.com/item?id=44683481 - July 2025 (222 comments)
Google URL Shortener links will no longer be available - https://news.ycombinator.com/item?id=40998549 - July 2024 (49 comments)
Ask HN: Google is sunsetting goo.gl on 3/30. What will be your URL shortener? - https://news.ycombinator.com/item?id=19385433 - March 2019 (14 comments)
Tell HN: Goo.gl (Google link Shortener) is shutting down - https://news.ycombinator.com/item?id=16902752 - April 2018 (45 comments)
Google is shutting down its goo.gl URL shortening service - https://news.ycombinator.com/item?id=16722817 - March 2018 (56 comments)
Transitioning Google URL Shortener to Firebase Dynamic Links - https://news.ycombinator.com/item?id=16719272 - March 2018 (53 comments)
Per google, shortened links “won't work after August 25 and we recommend transitioning to another URL shortener if you haven’t already.”
Am I missing something, or doesn’t this basically obviate the entire gesture of keeping some links active? If your shortened link is embedded in a document somewhere and can’t be updated, google is about to break it, no?
(In addition to the higher activity ones parent link says they'll now continue to redirect.)
Unless I'm just super smart (I'm not), it's pretty easy to write a URL shortener as a key-value system, and pure key-value stuff is pretty easy to scale. I cannot imagine that isn't doing something as or more efficient than what I did.
Either way, we're talking about a dataset that fits easily in a 1U server with at most half of its SSD slots filled.
Even though all I did was setup the docker container one day and forget about it
How would that even function, I mean, did they loop through every single permutation and see the result, or what exactly/ how would that work?
In short, yes. Since no one can make new links, it's a pre-defined space to search. They just requested every possible key, and recorded the answer, and then uploaded it to a shared database.