esp. for image data libraries, why not provide the images as a dump instead? No need to crawl 3mil images if the download button is right there. Now put the file on a cdn or Google and you're golden
atonse · 13h ago
How was this not a problem before with search engine crawlers?
Is this more of an issue with having 500 crawlers rather than any single one behaving badly?
Ndymium · 7h ago
Search engine crawlers generally respected robots.txt and limited themselves to a trickle of requests, likely based on the relative popularity of the website. These bots do neither, they will crawl anything they can access and send enough requests per second to drown your server, especially if you're a self hoster running your own little site on a dinky server.
Search engines never took my site down, these bots did.
OutOfHere · 14h ago
Requiring PoW (proof-of-work) could take over for simple requests, rejecting requests until a sufficient nonce is included in the request. Unfortunately, this collective PoW could burden power grids even more, wasting energy+money+computation for transmission. Such is life. It would be a lot better to just upgrade the servers, but that's never going to be sufficient.
Yes, although the concept is simple enough in principle that a homegrown solution also works.
Zardoz84 · 13h ago
We are wasting power on feeding statistics parrots, and we need to waste additional power to avoid being DoS by that feeding.
We will be better without that useless waste of power.
treyd · 13h ago
What do you suppose we as website owners do to prevent our websites from being DoSed in the meantime? And how do you suppose we convince/beg the corporations running AI scraping bots to be better users of the web?
jaoane · 5h ago
Write proper websites that do not choke that easily.
OutOfHere · 11h ago
This should be an easy question for an engineer. It depends on whether the constraint is CPU or memory or database or network.
zihotki · 5h ago
Technology can't solve a human problem, the constraints are in budgets and in available time
Is this more of an issue with having 500 crawlers rather than any single one behaving badly?
Search engines never took my site down, these bots did.
https://anubis.techaro.lol/
We will be better without that useless waste of power.