Anna's Archive: An Update from the Team

305 jerheinze 86 8/18/2025, 4:31:48 PM annas-archive.org ↗

Comments (86)

cakealert · 2m ago

Can Anna's Archive claim to be a non-profit when it's effectively an illegal enterprise with unknown controllers?

They are even offering decent bounties: https://software.annas-archive.li/AnnaArchivist/annas-archiv...

Whoever is running it must be doing really well for themselves laundering all that crypto.

ofou · 23m ago

Shadow libraries maintainers deserve a Nobel prize for their contributions to humanity. Satoshi would be proud.

No comments yet

vlade11115 · 40m ago

Also, they provide a torrents list that anyone can seed and be part of the long-term preservation.

https://annas-archive.org/torrents

whirlwin · 2m ago

Just curious - What is the future of service like these? More and more content will be AI generated, to some degree. And should thereby that content be aggregated?

boombapoom · 55m ago

fuck those guys, annas archive is one of the last good things about the internet.

Koshkin · 13m ago

> the last good things

Last but not least?

lysace · 24m ago

1. Information wants to be free. :-)

2. I used to think that way about The Pirate Bay guys until they hacked into the Swedish equivalency of the US social security number database and then fled to Cambodia. (Or did it from Cambodia. I don’t remember the exact timeline.)

What I mean to say is: I have been disappointed by my heroes before.

tzs · 12m ago

If #1 is a reference to a famous quote from Steward Brand, founder of the Whole Earth Catalog, it's only part of the quote. The rest is relevant:

> On the one hand you have—the point you’re making Woz—is that information sort of wants to be expensive because it is so valuable—the right information in the right place just changes your life. On the other hand, information almost wants to be free because the costs of getting it out is getting lower and lower all of the time. So you have these two things fighting against each other

He stated later more succinctly:

> Information Wants To Be Free. Information also wants to be expensive. ...That tension will not go away

gjsman-1000 · 23m ago

> Information should be free

I'm sick and tired of this misquote; as it was merely an observation of trends, and was never meant to be a moral maxim or mandate. If you truly believe information needs to be free as a moral mandate, share your company's source code first.

danielPort9 · 19m ago

I see it as “everyone deserves respect”. No need to overanalyse it. It’s one of those few things in life that are simply true, no proof needed.

Ar-Curunir · 12m ago

People can do good things and bad things simultaneously. Unless me supporting the good things directly enables also the bad things, I don't see a reason to throw out the good thing.

Davidzheng · 20m ago

was the alternative for the pirate bay people jailtime?

justin66 · 18m ago

"Anna’s Archive itself has organized some of the largest scrapes: we acquired tens of millions of files from IA Controlled Digital Lending"

Not really helping in the big picture, here, guys.

thorn · 49m ago

Kudos to the team behind this project! It looks like they have improved UI in last year. The crucial problem right now is to remain accessible or to survive. I have no idea how much effort is being put into it. I wonder is it possible to remain afloat despite all efforts to take them down?

jauntywundrkind · 23m ago

There was a pretty major UI update in the past 2-5 days-ish.

Apologies for the minor grumble, but on mobile I used to be able to browse search results much more effectively; the new design only fits ~4-5 results on a screen.

dulpo · 1h ago

This is surprising. I thought last I heard they'd arrested the guy who was suspected of running the site, about a year or so ago. Guess I'm misremembering.

Also I'm surprised Cloudflare hasn't shut them down like they do for other dodgy sites.

lode · 1h ago

When accessing from Belgium the link is blocked by Cloudflare:

Error HTTP 451 Unavailable For Legal Reasons

In response to a legal order, Cloudflare has taken steps to limit access to this website through Cloudflare's pass-through security and CDN services within Belgium

dulpo · 1h ago

Interesting. Seems to be only certain jurisdictions. I can access it no problem from the UK Vodafone network.

teekert · 1m ago

Set proton VPN to Albania and enjoy the full internet is my experience.

camtarn · 34m ago

I'm unable to resolve the domain on EE UK - looks like it's DNS blocked.

By comparison, on my work network (TalkTalk) I can resolve the domain but I get a connection reset from the site.

I think this might be the first time I've hit a DNS block. It feels rather eerie seeing people talking about a site that, from my point of view, doesn't even exist...

spacedcowboy · 1h ago

Hmm. Even the title link above doesn't work for me on Virgin's cable, in the UK

dulpo · 53m ago

Do you see an error page / blocked page?

I used to get archive.org blocked and had to contact my provider to have the filters taken off.

spacedcowboy · 44m ago

Nope,it just takes forever, then eventually shows a blank screen...

barrell · 58m ago

Yep blocked by Ziggo in NL as well

telesilla · 51m ago

Whenever I'm in the Netherlands I need to set my DNS to 1.1.1.1 or similar, lots of blocks.

borski · 1m ago

Except that that’s CloudFlare, which is also blocking Anna’s Archive.

noble-lombax · 50m ago

I actually didn't know there were more error codes beyond error code 429

Mogzol · 43m ago

There's "431 Request Header Fields Too Large" which you will see occasionally. But after that 451 is the only other 400-level error code above 429. It was chosen as a reference to the book Fahrenheit 451.

mariusor · 31m ago

451 is kind of a novelty code, its meaning being related to Bradbury's "Fahrenheit 451" SciFi novel.

goku12 · 15m ago

Oh! You'll love this: 418 I'm a teapot

https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...

5555624 · 59m ago

The two behind Z-Library were arrested in late 2022.

dulpo · 56m ago

Thank you, I think I must have got the details of that confused with the OCLC lawsuit.

baal80spam · 53m ago

annas-archive.li/blog, 2025-08-17

About recent events.

We are still alive and kicking. In recent weeks we’ve seen increased attacks on our mission. We are taking steps to harden our infrastructure and operational security. The work of securing humanity’s legacy is worth fighting for.

Since we started in 2022, we have liberated tens of millions of books, scientific articles, magazines, newspapers, and more. These are now forever protected from destruction by natural disasters, wars, budget cuts, and other catastrophes, thanks to everyone who helps with torrenting.

Anna’s Archive itself has organized some of the largest scrapes: we acquired tens of millions of files from IA Controlled Digital Lending, HathiTrust, DuXiu, and many more.

We have also scraped and published the largest book metadata collections in history: WorldCat, Google Books, and others. With this we’ll be able to identify which books are still missing from our collections, and prioritize saving the rarest ones.

Much thanks to all of our volunteers for making these projects happen.

We’ve forged some incredible partnerships. We’ve partnered with two LibGen forks, STC/Nexus, Z-Library. We’ve secured tens of millions additional files through these partnerships. And they are helping the mission by mirroring our files.

Unfortunately we have seen the disappearance of one of the LibGen forks. We don’t have further information about what happened there, but are saddened by this development.

There is a new entrant: WeLib. They appear to have mirrored most of our collection, and use a fork of our codebase. We have copied some of their user interface improvements, and are grateful for that push. Sadly, we are not seeing them share any new collections, nor share their codebase improvements. Since they haven’t shown commitment to contributing back to the ecosystem, we advise extreme caution. We recommend not using them.

In the meantime, we have some exciting projects in the works. We have hundreds of terabytes in new collections sitting on our servers, waiting to be processed. If you’re at all interested in helping out, feel free to check out our Volunteering and Donate pages. We run all of this on a minimal budget, so any help is greatly appreciated.

Keep fighting.

stonecharioteer · 1h ago

Please remain up. Libgen no longer works. I've used IRC for fiction and non-fiction but tech books needs Anna's Archive and Libgen. I buy the physical with company budget to pay the author but I need DRM free ebooks to read comfortably on my Tab S9 Ultra.

DyslexicAtheist · 20m ago

libgen is still there

gregorygoc · 1m ago

What’s the url?

slt2021 · 1h ago

Anna's archives is possibly the greatest site ever.

Infinite love to the team <3

xtracto · 58m ago

Kind of... the fact that they have the actual data behind a "soft" paywall (waiting times and terribly slow transfers otherwise) makes me a bit skeptic of their "goodwill".

SimianSci · 18m ago

No such thing as free when bandwidth costs money. Any service online that is handing out things for free without restriction is getting their return through scrupulus means and shouldnt be trusted. Anna's Archive straddles the line enough to allow people to download books for free but not at too great an expense to the volunteers who pay out of pocket to support the project.

nulld3v · 8m ago

I believe you only hit the paywall when you try to use the search engine & download individual files. They still offer the underlying data for free archival/mirroring via torrents.

0cf8612b2e1e · 53m ago

Their backdoor plan to get rich! Not going to fool me this time VCs!!

Everyone involved is taking on significant personal liability and hosting expenses. Not sure what more you expect.

klik99 · 20m ago

Yes spot on, crazy that asking for an optional pittance for less bandwidth throttling on such a huge and risky project can be seen as exploitative.

exe34 · 9m ago

you should ask for a refund!

mattl · 55m ago

Bandwidth isn’t free of charge

bibelo · 41m ago

and hosting

oguz-ismail · 42m ago

> We recommend not using them

I've been using WeLib since April and had a good experience so far

SimianSci · 21m ago

If efforts like this are to be sustainable in any lasting way, participants need to be cooperative, not parasitic. I agree with the Anna's Archive team, it serves noone to have one of these players in the space hoarding their own collections and not sharing them to other archiving projects, it make the collection extremely vulnerable and at risk of becoming lost knowledge as time goes on.

jeron · 15m ago

I disagree with how this is framed. shadow libraries thrive on decentralization, any other servers mirroring a collection is better than no mirrors at all

carlosjobim · 7m ago

No honour among thieves.

keroro · 33m ago

Why use them over annas archive?

oguz-ismail · 2m ago

cleaner interface

max_ · 1h ago

The entire internet needs to be re-designed to stand up against attacks.

- DDOS attacks

- Spamming

- UK like surveillance laws

- LLM scraping

Why is it that there is almost not initiative for this?

grues-dinner · 57m ago

The Internet has been redesigned. It's just not been redesigned with your interests in mind and at least some of the "attacks" are features to the right people.

theturtletalks · 54m ago

The precursor to BitCoin was this interesting project called HashCash. It was built to combat email spam and forced the sender to spend compute solving a moderate hash and put it in the header. The person who receives the email can prove easily if the sender "paid" the cost.

progval · 55m ago

There are, but they each have their tradeoffs.

Proof of work and micropayments (eg. Xanadu or Internet Mail 2000) schemes solve spamming and LLM scraping, but are more expensive or more CPU-intensive.

P2P systems like FreeNet too, but they are harder to use and more storage intensive and make it easier to spy on individual users.

Tor solves UK-like surveillance laws but it's slower and makes it easier to spam.

freefaler · 1h ago

Decentralization and interoperability, including the TCP routing protocols give the ability for the network to grow freely, but makes those kind of attacks easier.

The easiest way to mitigate those problem will be to decrease the openness and centralize more. It might lead to even worse things that DDOS.

GuB-42 · 20m ago

RFC-3514 [1] proposed an effective solution against attacks.

So see, there are initiatives, but people treat it as a joke, maybe because of when it was released.

[1] https://www.ietf.org/rfc/rfc3514.txt

uberman · 54m ago

Out of curiosity, do you see the archive in question as being part of the problem or that it needs protection from the issues you raise?

butchkass · 1h ago

Go right ahead

ilovefood · 1h ago

I fully agree. It's difficult though because I genuinely believe that the solution space overlaps with cryptography, which is quickly discounted as viable option because it is now laden with negative connotations.

goku12 · 25m ago

Cryptography has negative connotations? Like what? Do you mean cryptocurrency by any chance? (If so, it's feasible to practice cryptography without touching cryptocurrency).

gia_ferrari · 2m ago

Not op, but in my bubble:

- DRM. - Owner-unfriendly device locks (such as manufacturer-controlled secure boot or locked-down OSes). - Inability to audit network traffic from one's own devices, i.e. an IoT device. - Remote attestation, when in opposition to open computing.

I could also see folks seeing the use of cryptography as "having something to hide" - I don't personally agree.

vpribish · 56m ago

nah. cryptography is not seriously held back by cryptocurrency

monster_truck · 1h ago

I'll start the wiki

meindnoch · 55m ago

I'll design the logo!

IAmBroom · 22m ago

I'll make a GUI in Visual Basic!

exe34 · 7m ago

I'll bring my axe!

anon191928 · 1h ago

because they will come after new design? how do you not see this?

dulpo · 1h ago

Redesigned like how?

exe34 · 7m ago

the problem is that anybody who does that work will be targeted very quickly by the people in power.

even if it's decentralised, it'll be banned one way or another and you'll be hunted down.

random3 · 1h ago

"Be the change you want to see in the world"

NoMoreNicksLeft · 1h ago

I dread these. I still remember the rarbg announcement from a few years back I saw here. Do I even dare click the link?

HedgeMage · 1h ago

Not that scary. Click it.

crest · 1h ago

They just announced that they're still in the fight.

ronsor · 1h ago

I think you'll be happy if you do

revskill · 1h ago

Openai need to train their models based on these books, not stackoverflow or reddit.

burkaman · 1h ago

They do: https://xcancel.com/vxunderground/status/1888019174133276846, https://www.theverge.com/2023/7/9/23788741/sarah-silverman-o...

The tweet only names Meta, but it would be very surprising if OpenAI didn't do the same thing.

CamperBob2 · 1h ago

Anyone who doesn't train on all material available, legal or otherwise, will be outcompeted by teams that do, including those based in countries that don't respect Western copyright law. It's that simple.

Either this is practice is judged (or legislated) to be fair use, or copyright is done. It's also that simple.

atrettel · 44m ago

I'm not convinced that LLMs and other AI models need to train on all material available. A representative sample is better.

I'll ignore the legality aspects in my response. I think coming up with a representative sample of all relevant information would be better in the long term (teams will not be outcompeted on long time horizons). Why don't the companies do this? Because it is easier to just "carpet bomb the parameter space" and worry about the potential confounding [1] and sampling bias [2] later. Coming up with a representative sample requires domain expertise and that is expensive in terms of time and money. But it reduces the total amount of training data and should reduce the amount of time and resources it takes to build the models. That may matter now that models are quite large.

This is definitely a design decision with tradeoffs on both sides. I can entertain the notion that we don't have time to sample things, but I think we are all too often dismissing the long-term benefits of proper sampling.

(In terms of the legality aspects, judges are trying to "split the baby" [3] in my opinion by saying that training on stuff you got legally is OK but training on pirated material isn't. So nobody is going to recommend training on pirated material in the first place.)

[1] https://en.wikipedia.org/wiki/Confounding

[2] https://en.wikipedia.org/wiki/Sampling_bias

[3] https://www.404media.co/judge-rules-training-ai-on-authors-b...

alfalfasprout · 52m ago

So, what? Authors and rights holders are supposed to just take it?

Copyright law exists for a reason. Trying to improve an LLM doesn't give you the right to flout our legal system. Yes, other countries might have an advantage in LLM training as a result but so be it.

crazygringo · 46m ago

> Authors and rights holders are supposed to just take it?

If it's judged as fair use, then yes. And then it's not flouting anything.

Remember the whole point of fair use is to benefit society by allowing reuse of material in ways that don't directly copy large portions of the material verbatim.

For example, nonfiction authors already "just take it" when reviews describe the main points of their book without paying them a cent. The justification is that it's for the greater good, and rights are limited.

atrettel · 33m ago

Judges have recently ruled [1] that training on legally obtained materials constitutes fair use, but we will have to see in the long term if that ruling holds up.

[1] https://www.404media.co/judge-rules-training-ai-on-authors-b...

Night_Thastus · 30m ago

>the whole point of fair use is to benefit society

I'll stop you right there - I really don't think that applies at all. Does 'society' really benefit when the whole thing is a funnel for enormous amounts of wealth to go to already-gigantic companies like Microsoft?

bee_rider · 20m ago

It seems like it could conceivably be fair in some sense, as long as the models were actually released as open-weights (for the benefit of society).

bfrankline · 37m ago

> Remember the whole point of fair use is to benefit society by allowing reuse of material in ways that don't directly copy large portions of the material verbatim.

How do you think masked language models work?

bugufu8f83 · 1h ago

They do, don't they? I think OpenAI uses libgen.

Meta managed to get into a private ebook torrent tracker called Bibliotik a few years ago to use for training Llama and the resulting publicity essentially killed the tracker.

Anna's Archive: An Update from the Team (annas-archive.org)

FFmpeg Assembly Language Lessons (github.com)

Show HN: I built an app to block Shorts and Reels (scrollguard.app)

My Retro TVs (myretrotvs.com)

Left to Right Programming: Programs Should Be Valid as They Are Typed (graic.net)

TREAD: Token Routing for Efficient Architecture-Agnostic Diffusion Training (arxiv.org)

The Weight of a Cell (asimov.press)

Launch HN: Reality Defender (YC W22) – API for Deepfake and GenAI Detection (realitydefender.com)

Who Invented Backpropagation? (people.idsia.ch)

Web apps in a single, portable, self-updating, vanilla HTML file (hyperclay.com)

Turning an iPad Pro into the Ultimate Classic Macintosh (2021) (blog.gingerbeardman.com)

The Cutaway Illustrations of Fred Freeman (5wgraphicsblog.com)

Show HN: Whispering – Open-source, local-first dictation you can trust (github.com)

Typechecker Zoo (sdiehl.github.io)

Electromechanical reshaping, an alternative to laser eye surgery (medicalxpress.com)

AWS pricing for Kiro dev tool dubbed 'a wallet-wrecking tragedy' (theregister.com)

Robots.txt is a suicide note (2011) (wiki.archiveteam.org)

A gigantic jet caught on camera: A spritacular moment for NASA astronaut (science.nasa.gov)

Image Fulgurator (2011) (juliusvonbismarck.com)

Vibe coding tips and tricks (github.com)

Sky Calendar (abramsplanetarium.org)

Countrywide natural experiment links built environment to physical activity (nature.com)

SystemD Service Hardening (roguesecurity.dev)

MCP doesn't need tools, it needs code (lucumr.pocoo.org)

The Lives and Loves of James Baldwin (newyorker.com)

Class-action suit claims Otter AI records private work conversations (npr.org)

Walkie-Textie Wireless Communicator (technoblogy.com)

8x19 Text Mode Font Origins (os2museum.com)

Weather Radar APIs in 2025: A Founder's Complete Market Overview (rainviewer.com)

MCP tools with dependent types (vlaaad.github.io)

Texas law gives grid operator power to disconnect data centers during crisis (utilitydive.com)

When you're asking AI chatbots for answers, they're data-mining you (theregister.com)

Nvidia Tilus: A Tile-Level GPU Kernel Programming Language (github.com)

LLMs and coding agents are a security nightmare (garymarcus.substack.com)

Show HN: A Minimal Hacker News Reader for Apple Watch Built with SwiftUI (github.com)

One person was able to claim 20M IPs (lists.nanog.org)

Unification (2018) (eli.thegreenplace.net)

Scientists discover surprising language 'shortcuts' in birdsong – like humans (manchester.ac.uk)

Ukraine gives award to foreign vigilantes for hacks on Russia (2024) (bbc.com)

AI accounts impersonating doctors on social media [video] (youtube.com)

Mangle – a language for deductive database programming (github.com)

Counter-Strike: A billion-dollar game built in a dorm room (nytimes.com)

Clojure Async Flow Guide (clojure.github.io)

Google admits anti-competitive conduct involving Google Search in Australia (accc.gov.au)

Website is served from nine Neovim buffers on my old ThinkPad (vim.gabornyeki.com)

Non-Uniform Memory Access (NUMA) is reshaping microservice placement (codemia.io)

Modifying other people's software (natkr.com)

Claudia – Desktop companion for Claude code (claudiacode.com)

LinkedIn is the fakest platform of them all (prospectmagazine.co.uk)

A short statistical reasoning test (emiruz.com)

Anna's Archive: An Update from the Team

Comments (86)