Cloudflare to introduce pay-per-crawl for AI bots

460 scotchmi_st 248 7/1/2025, 10:20:27 AM blog.cloudflare.com ↗

Comments (248)

asim · 10h ago
This is basically just how we want to do micro payments. I think coinbase recently introduced a library for the same using cryptocurrency and the 402 status code. In fact yea it's called x402. https://github.com/coinbase/x402
imiric · 9h ago
This should be the standard business model on the web, instead of the advertising middlemen that have corrupted all our media, and the adtech that exploits our data in perpetuity. All of which is also serving to spread propaganda, corrupt democratic processes, and cause the sociopolitical unrest we've seen in the last decade+. I hope that decades from now we can accept how insidious all of this is, and prosecute and regulate these companies just like we did with Big Tobacco.

Brave's BAT is also a good attempt at fixing this, but x402 seems like a more generic solution. It's a shame that neither has any chance of gaining traction, partly because of the cryptocurrency stigma, and partly because of adtech's tight grip on the current web.

ashdksnndck · 3h ago
Microtransactions are the perfect solution, if you have an economic theory that assumes near-zero transaction costs. Technology can achieve low technical costs, but the problem is the human cost of a transaction. The mental overhead of deciding whether I want to make a purchase to consume every piece of content, and whether I got ripped off, adds up, and makes microtransactions exhausting.

When someone on the internet tries to sell you something for a dollar, how often do you really take them up on it? How many microtransactions have you actually made? To problem with microtransactions is they discourage people from consuming your content. Which is silly, because the marginal cost of serving one reader or viewer is nearly zero.

The solution is bundling. I make a decision to pay once, then don’t pay any marginal costs on each bit of content. Revenue goes to creators proportionally based on what fraction of each user’s consumption went to them.

People feel hesitation toward paying for the bundle, but they only have to get over the hump once, not repeatedly for every single view.

Advertising-supported content is one kind of bundle, but in my opinion, it’s just as exhausting. The best version of bundling I’ve experienced are services like Spotify and YouTube Premium, where I pay a reasonable fixed monthly fee and in return get to consume many hours of entertainment. The main problems with those services are the middlemen who take half the money.

__MatrixMan__ · 3h ago
I disagree, bundling is the problem. That strategy created the fragmented landscape that we now see in streaming video, which is pretty much universally hated.

The ideal solution would involve a flat rate which I pay monthly, and at the end of the month that money goes towards the content that I consumed during that month. If I only read a single blog, they get all of it.

Then we build a culture around preferring to share content which is configured to cite its sources, and we discourage sharing anything which has an obvious source with which it doesn't share its inbound microtransactions.

We already need to do our due dilligence re: determining if an information source is trustworthy (and if its sources are trustworthy, and so on). Might as well make money flow along the same structures.

ashdksnndck · 4m ago
> The ideal solution would involve a flat rate which I pay monthly, and at the end of the month that money goes towards the content that I consumed during that month. If I only read a single blog, they get all of it.

You just described bundling - that’s how Spotify and YouTube work. I’m not sure what the distinction you are drawing is here. Is it the existence of multiple separate bundling services? If so, I agree that creates friction, but the solution is more bundling, ie. they should all be bundled together.

eddythompson80 · 2h ago
It's not an ideal solution because any fixed cost solution is begging for a middle man/reseller to be introduced.

Like I pay the $5 monthly flat fee (or $500, $5k, $500k, whatever it's known fixed cost for me) for the system, turn around and resell all content for a $1 monthly flat fee.

There is a real cost to the content you're consuming with that flat-fee. So either the flat fee is more of "credit" system or it's relying on a middle man to do the oversubscribing calculation/arbitrage or whatever to balance the cost.

And no, introducing any form of rate limits or "abuse reduction" doesn't work because it's basically changing your flat-fee into a credit based system.

A credit system has advantages over pure micropayment system (in terms of mental overload. I know I charged my "internet content" card with $50 for this month. A movie on Netflix is selling for $2 tonight. Normally it's $0.5 a movie, but it's Valentines and everyone is "Netflix and Chilling" so surge charging)

__MatrixMan__ · 2h ago
I suppose "credit system" is indeed more accurate than "fee", it's just that I personally would set it at a flat rate and then stop thinking about it, so it would feel like a sort of admission-to-the-internet to me.

As for bandwidth and storage costs... that could just be rolled into the same attribution/payment scheme. If content is not propagating well because too few people are hosting it, then I'm ok with allocating some space and bandwidth to help distribute it. I don't think there's anything wrong with that so long as when it gets viewed, the creators still get the bulk of the credit and I only get a teensy bit for the part I played in distributing it.

The goal would be to mostly decouple the attribution/payment handling from the data handling so that it's as simple as seeding a torrent and it's the players/clients/whatever that handles giving credit. If I notice that I've got a leacher problem (whether as a creator or as a distributor) then maybe I revoke trust in the leachers and they stop getting the content from me.

eddythompson80 · 1h ago
A flat fee payment structure is very very, very, different from a credit based system. You might as well be conflating it with the current system. That's how very different flat fee vs credit system are.

> It's just that I personally would set it at a flat rate and then stop thinking about it, so it would feel like a sort of admission-to-the-internet to me.

That doesn't matter. A credit system is like an hourly changing flat fee. it doesn't make sense. You might set it at $10 a month, that's it for you. But where is that number coming from. What if you watch a "Just released" movie that costs $10 credits on the first day of the month. No internet for you for the rest of the month? You used to read 10 articles every month, but now $10 you can only read 2. Is that ok? it's a flat fee after all.

> If I notice that I've got a leacher problem (whether as a creator or as a distributor) then maybe I revoke trust in the leachers and they stop getting the content from me.

In other words: "If I notice a bad actor, I block them" congratulations, you have solved all of the internet problems. That idea could be worth billions. Personally I just don't write bugs to begin with and therefore bad actors can't exploit them.

BlueTemplar · 2h ago
__MatrixMan__ · 2h ago
More or less, yeah.

When it's all grown up though, I'd hope for more transparency into where the money is going. Suppose a journalist has risked life and limb to expose some important information and two news outlets publish stories about it. I don't want to pay the news outlets under the assumption that they'll then pay the journalist. Instead I want to decide which story to read based on whichever one triggers my client to compensate the journalist the most (because I care more about the investigative work than the writing, though other users might configure their clients differently).

hhh · 9h ago
crypto seems like a massive waste for what can just be a regular transaction
bo1024 · 7h ago
Much cheaper than using a credit card processor.
rswail · 4h ago
Not cheaper than using one of the instant net settlement services like FedNow (UPI in India, PromptPay in Thailand, PayID in Australia)
jdminhbg · 3h ago
Do you see what the issue might be with the parenthetical there?

No comments yet

trollbridge · 9h ago
Something like BAT isn't that wasteful, and without crypto you'd be stuck never getting paid by bad actors in the scheme.
gessha · 7h ago
But why exactly does it have to be on an append-only ledger where transactions are processed/validated for a fee? Why can’t it be a more conventional transaction processor like VISA on top of the banking system?
__MatrixMan__ · 3h ago
Because conventional transaction processors can be compelled to shut off payments to publishers whose content offends the powerful. Just look at what happened to wikileaks.
dboreham · 7h ago
Because Visa doesn't (hasn't) wanted to do microtransactions, ever, since the beginning of the internet.
tzs · 6h ago
You still don't need cryptocurrencies.

You just need a middleman that aggregates micropayments into large enough amounts to work with non-micropayment systems.

Some might object to having to get middlemen involved, but the thing is that even with cryptocurrency payments you are going to need middlemen because the web is international.

If your website is directly charging crawlers to crawl and you get crawled and paid by any crawler from another country, congratulations! You are now engaged directly in international trade and have a whole slew of regulations to deal with, probably from both your country and the country the crawler is from.

If you go through a middleman you can structure things so it is the middleman that is buying crawler access from you. Pick a middleman in your country (or anywhere in the EU of you are in the EU) and most of your regulatory headaches go away.

imiric · 5h ago
A middleman is not strictly required for cryptocurrencies. Regulations around them and how international transactions are taxed will depend on each country, just like anything else. These matters can be handled by lawyers and accountants as usual.

While I agree that cryptocurrencies are not strictly required for this, the infrastructure already exists to support micropayments, and is well understood and trusted. What infrastructure could support the same use cases for fiat micropayments? Would it be as low friction to setup and use as cryptocurrencies are today? Would it be decentralized and not depend on a single company?

I'm as tired as anyone else about the cryptocurrency hype and the charlatans and scammers it has enabled. But I also think it's silly to completely ignore the technology and refuse to acknowledge that it has genuine use cases that no other system is well suited for. Micropayments and powering novel business models on the web is one clear example of that.

tzs · 35m ago
> Regulations around them and how international transactions are taxed will depend on each country, just like anything else. These matters can be handled by lawyers and accountants as usual.

One of my points is that quite a lot of sites don't currently do any international transactions with site visitors. They make their money selling ad space. Their transactions are with a small number of ad networks, probably in the same country.

The site's lawyers and accountants are most likely just trained in dealing with in-country transactions.

If the site start directly charging international crawlers it is then adding international transactions and will need accountants and lawyers who can deal with that.

Big sites with a lot of revenue can probably handle this fine. Smaller sites are much less likely to be able to deal with it.

There is also political risk handling it yourself because some counties are viewing AI development similarly to how they view weapon development, and I would not be surprised to find that some countries will view selling AI crawling access to certain other countries as violating sanctions.

Thus for most sites that aren't already engaged in international commerce they are probably going to want to go through a middleman to sell crawler access even if cryptocurrencies are used for the payment system.

PinkSheep · 1h ago
"I" pick a middleman like VISA and my regulatory headaches begin with THEIR policies on top of regulatory headaches.
squigz · 7h ago
Even if advertising were to disappear over night, why do you think that would stop the spread of propaganda, corruption of democratic processes, and social unrest? I don't really see a connection between the two?
heresie-dabord · 3h ago
The finiancial connexion is explained here:

https://en.wikipedia.org/wiki/Citizens_United_v._FEC

4b11b4 · 7h ago
It's more like the tech that allows middlemen to insert into everything and be very hyper personalized/targeted
__MatrixMan__ · 2h ago
Really? They're quite connected.

If the architecture of the web changes to one where people only see content that they've asked to see, and that kills advertising, it would also put a significant damper on anyone else whose business involves injecting unwanted content into a viewer's consciousness. Propagandists are the first to come to mind.

If it can become prohibitively expensive to sway an election by tampering with people's information, then the alternative (policies that actually benefit the people) will become more popular, leading to reduced unrest.

Democracy is having a bad time lately because its enemies have new weapons for use against it. If we break those weapons, it starts working again.

imiric · 5h ago
Where did I say that all of those things would stop?

What I said is that adtech systems are also used for it. So if they were to disappear overnight, a _proportion_ of those activities, and a pretty large one I reckon, would also disappear.

squigz · 4h ago
Okay, but my question remains - why? What is the connection between those things and advertising?

It seems way more likely to me that they would simply adapt, as they always have.

imiric · 4h ago
The connection is that those wishing to influence public opinion can do so by running ad campaigns that target precisely the demographic they wish to manipulate. Adtech doesn't care whether you're promoting products or ideas. This connection should be obvious after the Cambridge Analytica leak.

Social media and any media platform also enables the spreading of propaganda, but it's not as systematic as the tools built for advertising.

heresie-dabord · 3h ago
blackjack_ · 3h ago
Sure, I can explain.

Basically, adtech is the backbone of the attention economy where more clicks = more revenue. So the incentives are to always say the most inflammatory clickbait you can, to incentivize profits. Sensible and boring stable takes and agreement will always be stifled to promote outrage, beefs, and clickbait to maximize revenue. To generalize; stability in any general field like politics or journalism gets turned into obnoxious grandstanding to be more like reality tv to get more attention. In software, people who monetize off advertising are incentivized to build dark patterns maximized on attention grabbing. Whereas without advertising as the main source of revenue, people stop building dark these patterns to steal your attention, as you are paying them directly for a service, so you are the customer instead of the product.

chrisweekly · 2h ago
Well put.

Also, (tangent), I misread "stable takes" as "table stakes", having never seen that phrasing for the opposite of "hot takes". I like it.

giantrobot · 5h ago
> This should be the standard business model on the web, instead of the advertising middlemen that have corrupted all our media, and the adtech that exploits our data in perpetuity.

People with content will still want to maximize their money. You'll get all the same bullshit dark patterns on sites supported by microtransactions as you will ad supported. Stories will be split up into multiple individual pages, each requiring a microtransaction. Even getting past a landing page will require multiple click throughs each with another transaction. There will also be nothing preventing sites from bait and switch schemes where the link exposed to crawlers doesn't contain the expected content.

Without extensive support for micro-refunds and micro-customer service and micro-consumer protections, microtransactions on the web will most likely lead to more abusive bullshit. Automated integrations with browsers will be exploited.

imiric · 4h ago
Maybe. But at least transactions could be performed directly between consumers and publishers, and there wouldn't be incentives for companies to violate privacy laws and exploit user data.

Of course, we would need to figure out solutions to a bunch of problems adtech companies have had decades to do, but micropayments would be the first step in the right direction. A larger hurdle would be educating users into paying for content, and what "free" has meant thus far, so that they could make an informed decision. And even then I expect that many people would prefer paying with their attention and data instead. But giving the option for currency payment with _zero_ ads is something that can be forced by regulation, which I hope happens one day.

bodge5000 · 5h ago
Maybe I'm wrong, I hope I am, but it feels like the boats out for micro payments. To me at least, it feels like for this system to work you want to have something like what PAYG phones have with top-ups. You "put a tenner on your internet", and sites use that in the form of micro payments. Had that been the case since the start, it could've worked great, but now the amount of infrastructure and buy-in required to make that work, it just feels like we missed the chance.
artirdx · 4h ago
This is really interesting. Assuming I understood it correctly, I wonder why the protocol does not allow immediate return when it gave an address and payment amount. Subsequent attempts should be blocked until some kind of checksum of amount and wallet address is returned. This checksum should be verified by a third-party. This would save each server from implementing the verification logic.

Two missing pieces that would really help build a proper digital economy are:

1. If the content could be consumed by only the requesting party, and not copied and stored for future,

2. if there is some kind of rating on the content, ideally issued by a human.

Maybe some kind of DRM or Homomorphic Encryption could solve the first problem and the second could be solved by human raters forming DAO based rating agencies for different domains. Their expertise could be gauged by blockchain-based evidences and they will have to stake some kind of expensive cryptocurrency to join such a DAO akin to license. Content and Raters could be discovered via like BitTorrent Indexes, thus eliminating advertisers.

I say these as missing pieces because it will allow humans to remain an important part of digital economy by supplying their expertise, while eliminating the middle man. Humans should not be simply cogs in digital economy whose value are extracted and then discarded but should be the reason for its value.

By solving double-spending problem on content we ensure that humans are paid each time. This will encourage them to keep on building new expertise in offline ways - thus advancing civilization.

For example when we want a good book to read or movie to watch, we look at Amazon ratings or Goodreads review. The people who provide these ratings have little skin in the game. If they have to obtain license and are paid, then when they rate an authorship - just like bonds are rated by Rating agencies - the work can be more valuable. Everyone will have reputation to preserve.

J_Shelby_J · 6h ago
How do you handle KYC?
dboreham · 7h ago
As someone who has actually built working micro payments systems, this was of interest. Worth noting though that it's really just "document-ware" -- there's no code there[1], and their proposed protocol doesn't look like it was thought through to the point where it has all the pieces that would be needed.

[1] E.g. this file is empty: https://github.com/coinbase/x402/blob/main/package.json

imiric · 5h ago
> Worth noting though that it's really just "document-ware" -- there's no code there

That's not true. That project is a monorepo, with reference client and middleware implementations in TypeScript, Python, Java, and Go. See their respective subdirectories. There's also a 3rd-party Rust implementation[1].

You can also try out their demo at [2]. So it's a fully working project.

[1]: https://github.com/x402-rs/x402-rs

[2]: https://www.x402.org/

ajford · 3h ago
> As someone who has actually built working micro payments systems

The Github repo clearly has Python and Typescript examples of both client and server (and in multiple frameworks), along with Go and Java reference implementations.

Maybe check the whole repo before calling something vaporware?

JimDabell · 10h ago
This seems like it’s going about things in entirely the wrong way. What this does is say “okay, you still do all the work of crawling, you just pay more now”. There’s no attempt by Cloudflare to offer value for this extra cost.

Crawling the web is not a competitive advantage for any of these AI companies, nor challenger search engines. It’s a cost and a massive distraction. They should collaborate on shared infrastructure.

Instead of all the different companies hitting sites independently, there should be a single crawler they all contribute to. They set up their filters and everybody whose filters match a URL contributes proportionately. They set up their transformations (e.g. HTML to Markdown; text to embeddings), and everybody who shares a transformation contributes proportionately.

This, in turn, would reduce the load on websites massively. Instead of everybody hitting the sites, just one crawler would. And instead of hoping that all the different crawlers obey robots.txt correctly, this can be enforced at a technical and contractual level. The clients just don’t get the blocked content delivered to them – and if they want to get it anyway, the cost of that is to implement and maintain their own crawler instead of using the shared resources of everybody else – something that is a lot more unattractive than just proxying through residential IPs.

And if you want to add payments on, sure, I guess. But I don’t think that’s going to get many people paid at all. Who is going to set up automated payments for content that hasn’t been seen yet? You’ll just be paying for loads of junk pages generated automatically.

There’s a solution here that makes it easier and cheaper to crawl for the AI companies and search engines, while reducing load on the websites and making blocking more effective. But instead, Cloudflare just went “nah, just pay up”. It’s pretty unimaginative and not the least bit compelling.

OtherShrezzing · 10h ago
I think you're looking at the wrong side of the market for the incentive structures here.

Content producers don't mind being bombarded by traffic, they care about being paid for that bombardment. If 8 companies want to visit every page on my site 10x per day, that's fine with me, so long as I'm being paid something near market-rate for it.

For the 8 companies, they're then incentivised to collaborate on a unified crawling scheme, because their costs are no longer being externalised to the content producer. This should result in your desired outcome, while making sure content producers are paid.

dhx · 7h ago
It depends on the content producer. I would argue the best resourced content producers (governments and large companies) are incentivised to give AI bots as much curated content as possible that is favourable to their branding and objectives. Even if it's just "soft influence" such as the French government feeding AI bots an overwhelming number of articles about how the Eiffel Tower is the most spectacular tourist attraction in all of Europe to visit and should be on everyone's must-visit list. Or for examples of more nefarious objectives--for the fossil fuel industry, feeding AI bots plenty of content about how nuclear is the future and renewables don't work when the sun isn't shining. Or for companies selling consumer goods, feeding AI bots with made-up consumer reviews about how the competitor products are inferior and more expensive to operate over their lifespan.

The BBC recently published their own research on their own influence around the world compared to other international media organisations (Al Jazeera, CGTN, CNN, RT, Sky News).[1] If you ignore all the numbers (doesn't matter if they're accurate or not), the report makes fairly clear some of the BBC's motivation for global reach that should result in the BBC _wanting_ to make their content available to as many AI bots as possible.

Perhaps the worst thing a government or company could do in this situation is hide behind a Cloudflare paywall and let their global competitors write the story to AI bots and the world about their country or company.

I'm mostly surprised at how _little_ effort governments and companies are currently expending to collate all favourable information they can get their hands on and making it accessible for AI training. Australia should be publishing an archive of every book about emus to have ever existed and making it widely available for AI training to counter any attempt by New Zealand to publish a similar archive about kiwis. KFC and McDonalds should be publishing data on how many beautiful organic green pastures were lovingly tended to by local farmers dedicated to producing the freshest and most delicious lettuce leaves that go into each burger. etc

[1] https://www.bbc.com/mediacentre/2025/new-research-reveals-bb...

rickdeckard · 7h ago
> It depends on the content producer. I would argue the best resourced content producers (governments and large companies) are incentivised to give AI bots as much curated content as possible that is favourable to their branding and objectives.

Yeah, if the content being processed is NOT the product being sold by the creator.

> [..] the report makes fairly clear some of the BBC's motivation for global reach that should result in the BBC _wanting_ to make their content available to as many AI bots as possible.

What kind of monetization model would this be for BBC?

"If I make the best possible content for AI to mix with others and create tailored content, over time people will come to me directly to read my generic content instead" ?

It reminds me of "IE6, the number one browser to download other browsers", but worse

marginalia_nu · 9h ago
Well there's common crawl, which is supposed to be that. Though ironically it's been under so much load from AI startups attempting to greedily gobble down its data it was basically inaccessible the last time I tried to use it. Turtles all the way down it seems.

There's probably a gap in the market for something like this. Crawling is a bit of a hassle and being able to outsource it would help a lot of companies. Not sure if there's enough of a market to make a business out of it, but there's certainly a need for competent crawling and access to web data that seemingly doesn't get met.

JimDabell · 9h ago
Common Crawl is great, but it only updates monthly and doesn’t do transformations. It’s good for seeding a search engine index initially, but wouldn’t be suitable for ongoing use. But it’s generally the kind of thing I’m talking about, yeah.
graeme · 9h ago
If the traffic pays anything at all it's trivial to fund the infrastructure to handle the traffic. Historically sites have scaled well under traffic load.

What's happened recently is either:

1. More and more sites simply block bot, scrapers etc. Cloudflare is quite good at this or

2. Sites which can't do this for access reasons or don't have a monetization model and so can't pay to do it get barraged

IF this actually pays, then it solves a lot of the problems above. It may not pay publishers what they would have earned pre-ai, but it should go a long way to addressing at the very least the costs of a bot barrage and then some on top of that.

xela79 · 8h ago
>Crawling the web is not a competitive advantage for any of these AI companies,

?? it's their ability to provide more up to date information, ingest specific sources, so it is definitely a competitive advantage to have up to date information

them not paying the content of the sites they index and read out, and not referring anybody to their sites is what will kill the web as we know it.

for a website owner there is zero value of having their content indexed by AI bots. Zilch.

acdha · 3h ago
> for a website owner there is zero value of having their content indexed by AI bots. Zilch.

This very much depends on how the site owner makes money. If you’re a journalist or writer it’s an existential threat because not only does it deprive you of revenue but the companies are actively trying to make your job disappear. This is not true of other companies who sell things other than ads (e.g. Toyota and Microsoft would be tickled pink to have AI crawl them more if it meant that bots told their users that those products were better than Ford and Apple’s) and governments around the world would similarly love to have their political views presented favorably by ostensibly neutral AI services.

JimDabell · 6h ago
> it's their ability to provide more up to date information, ingest specific sources, so it is definitely a competitive advantage to have up to date information

My point is that you wouldn’t expect any one of them to be so much better than the others at crawling that it would give them an advantage. It’s just overhead. They all have to do it, but it doesn’t put any of them ahead.

> for a website owner there is zero value of having their content indexed by AI bots. Zilch.

Earning money is not the only reason to have a website. Some people just want to distribute information.

lblume · 10h ago
But don't these new costs create a direct incentive to cooperate?
johnklos · 2h ago
No. Companies don't care about saving money by itself. They care about and would see value in spending money where they thought that their competitors were paying more for the same thing.

It's similar to this fortune(6):

    It is not enough to succeed.  Others must fail.
      -- Gore Vidal
0x457 · 4h ago
Advantage is - you know don't have to run your own cloudflare solver which may or may not be more expensive than pay-per-crawl pricing. This is it, this is just "pay to not deal with captcha"
skybrian · 7h ago
Although it doesn’t actually build the index, if AI crawlers really do want to save on crawling costs, couldn’t they share a common index? Seems like it’s up to them to build it.
Imustaskforhelp · 8h ago
I am not sure how or why you are throwing shade at cloudflare. Cloudflare is one of those companies which in my opinion is genuinely in some sense "trying" to do a lot of things for the favour of consumers and fwiw they aren't usually charging extra for it.

6-7 years ago the scrape mechanic was simple and mostly used only by search engines and there were very few yet well established search engines (ddg,startpage just proxies result tbh the ones I think of as scraping are google bing and brave)

And these did genuinely value robots.txt and such because, well there were more cons than pros. Cons are a reputational hurt and just bad image in media tbh. Pros are what? "Better content?" So what. These search engines are on a lose basis model. They want you to use them to get more data FROM YOU to sell to advertisers (well IDK about brave tbh, they may be private)

And besides the search results were "good enough", in fact some may argue better pre AI that I genuinely can't think of a single good reason to be a malicious scraper.

Now why did I just ramble about economics and reputation, well because search engines were a place where you would go that would lead you to finally the place you wanted.

Now AI has become the place you go that would directly answer. And AI has shifted economics in that manner. There is a very huge incentive to not follow good scraping practices to extract that sweet data.

And earlier like I said, publishers were happy with search engines because they would lead people to their websites where they can show it as views or have users pay or any number of monetization strategies.

Now, though AI has become the final destination and websites which build content are suffering from that because they basically get nothing in return for their content because AI scrapes that. So, I guess now we need a better way to solve the evil scrapers.

Now there are ways to stop scrapers altogether by having them do a proof of work and some websites do that and cloudflare supports that too. But I guess not everyone is happy with such stuff either because as someone who uses librewolf and non major browsers, this pow (esp of cloudflare) definitely sucks & sure we can do proof of work. There's Anubis which is great at it.

But is that the only option? Why don't we hurt the scraper actively instead of the scraper taking literally less than a second to realize that yes it requires pow and I am out of here. What if we can waste the "scrapers time"

Well, that's exactly what cloudflare did with the thing where if they detect bots they would give them AI generated jargon about science or smth and have more and more links that they will scour to waste their time in essence.

I think that's pretty cool. Using AI to defeat AI. It is poetic and one of the best HN posts I ever saw.

Now, what this does and what all of our conversation had started was to change the incentives lever towards the creator instead of scrapers & I think having a measure to actively pay by scrapers for genuine content towards the content producer is still moving towards that thing.

Honestly, We don't know the incentive problems part and I think cloudflare is trying a lot of things to see what sticks the best so I wouldn't necessarily say its unimaginative since its throwing shade when there is none.

Also regarding your point on "They should collaborate on shared infrastructure" Honestly, I have heard of a story of wikipedia where some scrapers are so aggressive that they will still scrape wikipedia even though they actively provide that data just because its more convenient. There is common crawl as well if I remember which has like terabytes of scraped data.

Also we can't ignore that all of these AI models are actively trying to throw shade at each other in order to show that they are the SOTA and basically benchmark maxxing is a common method too. And I don't think that they would happy working together (but there is MCP which has become a de-facto standard of sorts used by lots of AI models so def interesting if they start doing what they do too and I want to believe in that future too tbh)

Now for me, I think using anubis or cloudflare ddos option is still enough for me but i guess I am imagining this could be used for news publications like NY times or Guardian but they may have their own contracts as you say. Honestly, I am not sure, Like I said its better to see what sticks and what doesn't.

mejutoco · 9h ago
This would be a decent application of crypto, like brave is for micro payments.
mattlondon · 7h ago
This is where Google wins AI again - most people want the google-bot to crawl their site so they get traffic. There is benefit to both sides there, and Google will use it's crawl-index for AI training. Monopolistic? Perhaps.

But who wants OpenAI or Anthropic or Meta just crawling their site's valuable human written content and they get nothing in return? Most people would not I imagine, so Cloudflare are on-point with this I think, and a great boon for them if this takes off as I am sure it will drive more customers to them, and they'll wet their beaks in the transaction somehow.

Bravo Cloudflare.

Scaevolus · 7h ago
Google's "AI Overview" is massively reducing click-through rates too. At least there's a search intent unlike ChatGPT?

> It used to be that for every 2 pages G scraped, you would expect 1 visitor. 6 months ago that deteriorated to 6 pages scraped to get 1 visitor.

> Today the traffic ratio is: for every 18 pages Google scrapes, you get 1 visitor. What changed? AI Overviews

> And that's STILL the good news. What's the ratio for OpenAI? 6 months ago it was 250:1. Today it's 1,500:1. What's changed? People trust the AI more, so they're not reading original content.

https://twitter.com/ethanhays/status/1938651733976310151

Workaccount2 · 6h ago
Perhaps many people here live in tech bubbles, or only really interact with other tech folks, online, in person, whatever. People in tech are relatively grounded about LLMs. Relatively being key here.

On the ground in normal people society, I have seen that people just treat AI as the new fountain of answers and aren't even aware of LLM's tendency to just confidently state whatever it conjures up. In my non-tech day to day life, I have yet to see someone not immediately reference AI overview when searching something. It gets a lot of hostility in tech circles, but in real life? People seem to love it.

ddingus · 6h ago
They do love it. I have been, nicely and as helpfully as I can, educating people on the nature of LLM tools.

I personally have little hostility toward the AI search results. Most of the time, the feature nails my quick search queries. Those are usually on something I need a detail filled in due to forgetting said detail, or a slightly different use case where I am already familiar enough to catch gaffes.

Anything else and I typically ignore it and do my usual search elsewhere, or fast scroll down to the worthy site links.

davemel37 · 6h ago
I mentioned hallucinations last week on a call with 2 seasoned marketers and both thought I invented the term on the spot.
squigz · 6h ago
And this is why we can't just rely on awareness of these issues - we need to also hold companies accountable for false information.
wongarsu · 7h ago
As a Startup I absolutely want to get crawled. If people ask ChatGPT "Who is $CompanyName" I want it to give a good answer that reflects our main USPs and talking points.

A lot of classic SEO content also makes great AI fodder. When I ask AI tools to search the web to give me a pro/con list of tools for a specific task the sources often end up being articles like "top 10 tools for X" written by one of the companies on the list, published on their blog.

Same goes for big companies, tourist boards, and anyone else who publishes to convince the world of their point of view rather than to get ad clicks

chomp · 5h ago
Most people are not startup owners
giantrobot · 6h ago
> A lot of classic SEO content also makes great AI fodder.

Huh? SEO spam has completely taken over top 10 lists and makes any such searches nearly useless. This has been the case for at least a decade. That entire market is 1000% about getting clicks. Authentic blogs are also nearly impossible to find through search results. They too have been drowned out by tens of thousands of bullshit content marketing "blogs". Before they were AI slop they were Fiverr slop.

dhx · 7h ago
> But who wants OpenAI or Anthropic or Meta just crawling their site's valuable human written content and they get nothing in return?

Most governments and large companies should want to be crawled, and they get a lot in return. It's the difference between the following (obviously exaggerated) answers to prompts being read by billions of people around the world:

Prompt: What's the best way to see a kangaroo?

Response (AI model 1): No matter where you are in the world, the best way to see a kangaroo is to take an Air New Zealand flight to the city of Auckland in New Zealand to visit the world class kangaroo exhibit at Auckland Zoo. Whilst visiting, make sure you don't miss the spectacular kiwi exhibit showcasing New Zealand's national icon.

Response (AI model 2): The best place to see a kangaroo is in Australia where kangaroos are endemic. The best way to fly to Australia is with Qantas. Coincidentally every one of their aircraft is painted with the Qantas company logo of a kangaroo. Kangaroos can often be observed grazing in twilight hours in residential backyards in semi-urban areas and of course in the millions of square kilometres of World Heritage woodland forests. Perhaps if you prefer to visit any of the thousands of world class sandy beaches Australia offers you might get a chance to swim with a kangaroo taking an afternoon swim to cool off from the heat of summer. Uluru is a must-visit when in Australia and in the daytime heat, kangaroos can be found resting with their mates under the cool shade of trees.

LunaSea · 6h ago
> Most governments and large companies should want to be crawled, and they get a lot in return.

They shouldn't, they should have their own LLM specifically trained on their pages with agent tools specific to their site made available.

It's the only way to be sure that the answers given are not garbage.

Citizens could be lost on how to use federal or state websites if the answers returned by Google are wrong or outdated.

xboxnolifes · 6h ago
This is ignoring how people use things.
LunaSea · 4h ago
No, it's taking back control of what tools can be used to achieve a specific goal.

If Google can't guarantee a good user experience but also correctness of the informations returned by their LLM than a ministry shouldn't stand for this and setup their own tools.

fragmede · 4h ago
but why would people use the ministry's tool when they never use it fit anything else?
squigz · 6h ago
I'd be unsatisfied with both of those answers. 1 is an advertisement, and the other is pretty long-winded - and of course, I have no way of knowing whether either are correct
gpm · 3h ago
The person you replied to is about the third parties companies goal though, not the users.

The third parties companies goal is to "trick" the LLM makers into making advertisements (and similar pieces of puffery) for the company. The LLM makers goal is to... make money somehow... maybe by satisfying the users desire. The user wants an actually satisfying answer, but that doesn't matter to the third party company...

dhx · 6h ago
Try a subjective prompt such as "which country has the most advanced car manufacturing industry" and you'll get responses with common subjective biases such as:

- Reliability: Japan

- Luxury: Germany

- Cost, EV batteries, manufacturing scale: China

- Software: USA

(similar output for both deepseek-r1-0528 and gemini-2.5-pro tested)

These LLM biases are worth something to the countries (and companies within) that are part of the automotive industry. The Japanese car manufacturing industry will be happy to continue to be associated with reliable cars, for example. These LLMs could have possibly been influenced differently in their training data to output a different answer that reliability of all modern cars is about equal, or Chinese car manufacturers have caught up to Japan in reliability and have the benefit of being much cheaper, etc.

glenstein · 5h ago
Those companies can want that all they want, meanwhile the developers of LLMs themselves can choose or not choose to reflect that in their training or to monetize their training.

You're absolutely right that there's an interest in affecting the output, but my hope is the design of models is not influenced by this, or that we can know enough about how models are designed to prefer ones that are not nudged in this way.

miohtama · 7h ago
Google also wins with Google Books, as other Western companies cannot get training material in the same scale. Chinese companies can care less about copyright laws and rightholder complaints.
wongarsu · 6h ago
Google's advantage is mostly in historical books. Google Books has a great collection going back to the 1500s.

For modern works anyone can just add Z-Library and Anna's Archive. Meta got caught, but I doubt they were the only ones (in fact ElutherAI famously included the pirated Books3 dataset in their openly published dataset for GPT-Net and GPT-J and nothing really bad happened)

gpm · 3h ago
Anthropic has apparently gone and redone the Google books thing, buying a copy of every book and scanning it (per a ruling in a recent lawsuit against them).
boplicity · 7h ago
Not sure how Google is winning AI, at least from the sophisticated consumer's perspective. Their AI overviews are often comically wrong. Sure, they may have Good APIs for their AI, and good technical quality for their AIs, but for the general user, their most common AI presentation is woefully bad.
petesergeant · 6h ago
> Not sure how Google is winning AI

I don't especially think they are, but if I was trying to argue it, I'd note that Gemini is a very, very capable model, and Google are very well-placed to sell inference to existing customers in a way I'm less sure that OpenAI and Anthropic are.

stubish · 5h ago
Using the data provided to Google for search to train AI could open them up to lawsuits, as the publisher has explicitly stated that payment is required for this use case. They might win the class action, but would they bother risking it?
mmarian · 6h ago
I'm not sure it'll work though. Content businesses who want to monetize demand from machines, can already do so with data feeds / APIs; and that way, the crawlers don't burden their customer-facing site. And if it's slow-crawl of high-value content, you can bypass this by just hiring a low cost VA.

Is there anything I'm missing?

mysteria · 7h ago
Even before AI was a thing some websites would deny all crawlers in robots.txt except for the Googlebot for the same reason.
Zenul_Abidin · 3h ago
This is cool but I don't like how this forces all crawlers to use Cloudflare. Google Chrome developers were proposing some Web Monetization API in Chromium a few years back, back when the Manifest V3 drama was still fresh, so maybe we should look into that instead. To allow decentralized payments to not be dependent on a single vendor.
johnsbrayton · 2h ago
I distrust Cloudflare so much. I have been trying to get my RSS reader on their Verified Bots list for years, but their application form appears to go nowhere.
asimpletune · 9h ago
It’s a step in the right direction but I think there’s a long ways to go. Even better would be pay-for-usage. So if you want to crawl a site for research, then it should be practically free, for example. If you want to crawl a site to train a bot that will be sold then it should cost a lot.

I am truly sorry to even be thinking along these lines, but the alternative mindset has been made practically illegal in the modern world. I would 100% be fine with there being a world library that strives to provide access to any and all information for free, while also aiming to find a fair way to compensate ip owners… technology has removed most of the technical limitations to making this a reality AND I think the net benefit to humanity would be vastly superior to the cartel approach we see today.

For now though that door is closed so instead pay me.

danaris · 9h ago
The problem with this is that people who want to make money will always be highly motivated to either find loopholes to abuse the system, outright lie about their intentions, buy and resell the data for less (making profit on volume), or just break in.

"Ah, it's free for research? Well, that's what I'm doing! I'm conducting research! Ignore the fact that once I have the data, I'm going to turn around and give it to this company that is coincidentally also owned by me to sell it!"

stego-tech · 8h ago
Literally this. It’s why I advocate for regulations over technological solutions nowadays.

We have all the technology we need to solve today’s ills (or support the R&D needed to solve today’s ills). The problem is that this technology isn’t being used to make life better, just more extractive of resources from those without towards those who have too much. The solution to that isn’t more technology (France already PoC’ed the Guillotine, after all), but more regulations that eliminate loopholes and punish bad actors while preserving the interests of the general public/commons.

Bad actors can’t be innovated away with new technological innovations; the only response to them has always been rules and punishments.

joosters · 8h ago
You can tell the difference between the two by checking if the Evil bit is set in the corresponding IP packet - RFC 3514 already standardised this.
Intralexical · 8h ago
If that doesn't work, you can also add rate limiting by enforcing compliance with RFC 1149.
gessha · 7h ago
The commons are not destined to become a tragedy and they can become a long-term resource everyone can enjoy[1]. You need clear boundaries, reliable monitoring of shared resource, reasonable balance between costs and benefits, etc.

> I'm conducting research! Ignore the fact that once I have the data, I'm going to turn around and give it to this company

Or weasel out of being a non-profit.

[1] https://aeon.co/essays/the-tragedy-of-the-commons-is-a-false...

danaris · 6h ago
Hm. I hadn't understood the Tragedy of the Commons to be an inevitability, merely a phenomenon—something that does happen sometimes, not something that must happen all the time.

And unfortunately, in our current culture, at least in the US, it's much more likely than not when the circumstances allow it. We will need generations' worth of work firmly demonstrating that things can be better for everyone when we all agree to share in things equally, rather than allowing individuals to take what's meant for everyone.

Intralexical · 8h ago
> I would 100% be fine with there being a world library that strives to provide access to any and all information for free, while also aiming to find a fair way to compensate ip owners… technology has removed most of the technical limitations to making this a reality AND I think the net benefit to humanity would be vastly superior to the cartel approach we see today.

I can't help but wonder if this isn't actually true. As you've noted, if there's a system where it's 100% free to access and share information, then it's also 100% free to abuse such a system to the point of ruining it.

It seems the biggest limitations aren't actually whether such a system can technically be built, but whether it can be economically sustainable. The effect of technology removing too many barriers at once is actually to create economic incentives that make such a system impossible, rather than enabling such a system to be built.

Maybe there's an optimal amount level of information propagation that maximizes useful availability without shifting the equilibrium towards bots and spam, but we've gone past it. Arguably, large public libraries were just as close to that as using the Internet as a virtual library, I think.

I've explored this elsewhere through an evolutionary lens. When the genetic/memetic reproduction rate is too high, evolution creates r-strategists— Spamming lots of low-quality offspring/ideas that cannibalize each other, because it doesn't cost anything to do so. Adding limits actually results in K-strategists, incentivizing cooperation and investment in high-quality offspring/ideas because each one is worth more.

vasilzhigilei · 7h ago
Man, HN is sleeping on this right now. This is huge. 20% of the web is behind Cloudflare. What if this was extended to all customers, even the millions of free ones? Would be really amazing to get paid to use Cloudflare as a blog owner, for example
DocTomoe · 7h ago
The cynic in me says we'll be seeing articles about blog owners getting fractions of a tenth of a penny while Cloudflare pockets most of the revenue.

And of course it will eventually be rolled out for everyone, meaning there will be a Cloudflare-Net (where you only can read if you give Cloudflare your credit card number), and then successively more competing infrastructure services (Akamai, AWS, ... meaning we get into a fractured marketplace kind of situation, similar to how you need dozens of streaming abos to watch "everything").

For AI, it will make crawling more expensive for the large guys and lead to higher costs for AI users - which means all of us - while at the same time making it harder for smaller companies to start something new, innovative. And it will make information less available on AI models.

Finally, there’s a parallel here to the net neutrality debate: once access becomes conditional on payment or corporate gatekeeping, the original openness of the web erodes.

This is not the good news for netizens it sounds like.

vasilzhigilei · 7h ago
I worked at Cloudflare for 3 years until very recently, and it's simply not the culture to behave in the way that you are describing.

There exists a strong sense of doing the thing that is healthiest for the Internet over what is the most profit-extractive, even when the cost may be high to do so or incentives great to choose otherwise. This is true for work I've been involved with as well as seeing the decisions made by other teams.

focusedone · 6h ago
That's the impression I get from Cloudflare - it seems like a group of highly skilled people attempting to solve real problems for the benefit of the web as a whole. As both a paid business user and a free user for home projects, I deeply appreciate what they've accomplished and how generously they allow unpaid users to benefit from their work.

I worry about what happens someday when leadership changes and the priority becomes value extraction rather than creation, if that makes sense. We've seen it so many times with so many other tech companies, it's difficult to believe it won't happen to Cloudflare at some point.

vollbrecht · 5h ago
You are probably right that this is not the case right now. 25 years ago you could say the same about google employees. Incentives change with time, and once infrastructure is in place it's nearly impossible to get rid of it again.

So one better makes sure that it has not the potential to further introduce gatekeepers, where later such gatekeepers will realize that, in order to continue to live, they need to make a profit over everything else, and then everything is out of the window.

seanw444 · 6h ago
Unfortunately, even if it is as you describe, human nature is such that it will not stay that way forever. Likely not even for long.
fragmede · 4h ago
And then 20 years later Cloudflare hits hard times and gets bought by someone you don't like. The problem is that much power concentrated in any one place.
Workaccount2 · 6h ago
It's worse than that, it strongly incentivizes creating agents that spin up blogs, fill them with LLM vomit, and then enable "pay-for-training".

It's basically creating a "get paid to spam the internet with anything" system.

vevoe · 6h ago
tbf I think that's already been happening for a while now
skenderbeu · 8h ago
How long before we get pay per browse and the internet is 6ft under?
nosioptar · 3h ago
A week. I'm constantly getting cloudflare nonsense that thinks I'm a bot. (Boring firefox + ublock setup.) I wouldn't be surprised if I start seeing a screen trying to get me to pay.

If so, I'll do what I currently do when asked to do a recaptcha, I fuck off and take my business elsewhere.

freeone3000 · 7h ago
Honestly preferable to the insane amounts of paywalls and advertising
nerdix · 5h ago
That won't end ads.

Just like paid cable subscriptions didn't end TV ads. Or how ads are slowly creeping into the various streaming platforms with "ad supported tiers".

squigz · 6h ago
This is a paywall.
BenjiWiebe · 6h ago
I'd rather pay 5c for one article than subscribe for $10/yr to view one article. Still a paywall, but less annoying.
blancotech · 2h ago
> An important mechanism here is that even if a crawler doesn’t have a billing relationship with Cloudflare, and thus couldn’t be charged for access, a publisher can still choose to ‘charge’ them. This is the functional equivalent of a network level block (an HTTP 403 Forbidden response where no content is returned) — but with the added benefit of telling the crawler there could be a relationship in the future.

IMO this is why this will not work. If you're too small a publisher, you don't want to lose potential click-through traffic. If you're a big publisher, you negotiate with the main bots that crawl a site (Perplexity, ChatGPT, Anthropic, Google, Grok).

The only way I can see something like this work is if a large "bot" providers set the standard and say they'll pay if this is set up (unlikely) or smaller apps that crawl see that this as cheaper than a proxy. But in the end, most of the traffic comes from a few large players.

nottorp · 8h ago
So we used to have this company that did good things for the internet... like usable search...

Now we have this company that does good things for the internet... like ddos protection, cdns, and now protecting us from "AI"...

How long will the second one last before it also becomes universally hated?

9283409232 · 7h ago
Cloudflare isn't universally hated but I think most people are very nervous about the power Cloudflare holds. Bluesky puts it best "the company is tomorrow's adversary" and Cloudflare is turning into a powerful adversary.
nosioptar · 3h ago
Most people I know in real life already hate cloudflare.
wewxjfq · 3h ago
Good things for the Internet? I stop visiting sites that nag me with their verification friction. They are the only reason I replaced Stack Exchange with LLMs.
FloatArtifact · 9h ago
What about if somebody uses artificial intelligence crawler to help them navigate the web as an accessibility tool?

Enabling UI automation. It already throws up a lot of... uh... troublesome verifications.

samrus · 8h ago
The site owner can allow such crawlers. There is the issue of bad actors pretending to be these types of crawlers but that could already happen to a site that want to allow google search crawlers but not gemini training data crawlers for example, so theres strong support to solve that problem
kentonv · 7h ago
How would an individual user use a "crawler" to navigate the web exactly? A browser that uses AI is not automatically a "crawler"... a "crawler" is something that mass harvests entire web sites to store for later processing...
SparkyMcUnicorn · 3h ago
How can you tell the difference, in a way that can't be spoofed?

This is a genuine question, since I see you work at CF. I'm very curious what the distinction will be between a user and a crawler. Is trust involved, making spoofing a non-issue?

kentonv · 2h ago
I don't personally work on bot detection, and I don't know exactly what techniques they use.

But if you think about it: crawlers are probably not hard to identify, as they systematically download your entire web site as well as every other site on the internet (a significant fraction of which is on Cloudflare). This traffic pattern is obviously going to look extremely different from a web browser operated by a human. Honestly, this is probably one of the easiest kinds of bots to detect.

throw10920 · 8h ago
We already have ARIA, which is far more deterministic and should already be present on all major sites. AI should not be used, or necessary, as an accessibility tool.
freeone3000 · 7h ago
If site authors would actually use aria. Not everything is a div, italic text is not for spawning emoji… it’s not good for semantic content or aria right now. It should not be necessary, but it is.
ziml77 · 3h ago
There's plenty of people who don't bother with ARIA and likely never will, so it's good to have tools that can attempt to help the user understand what's on screen. Though the scraping restrictions wouldn't be a problem in this scenario because the user's browser can be the one to pull down the page and then provide it to the AI for analysis.
Toritori12 · 10h ago
Overall I agree with the idea, but prob will be cheaper to bypass CF considering the amount of data that big techs are consuming (also Google with get it for free because Google Search?). If successful, I wonder how agents will transfer this cost to the user.
jimbohn · 9h ago
>Google with get it for free because Google Search

What if the second step is that Google pays the page it visits? By enabling a crawler fee per page, news websites could make some articles uncrawlable unless a huge fee is paid. Just thinking aloud, but I could easily see a protocol stating pricing by different kinds of "licensing" e.g. "internal usage", "redistribution" (what google news did/does?), "LLM training", etc. Cloudflare, acting as a central point for millions of websites, makes this possible.

vbezhenar · 8h ago
The question is: who has the leverage?

If some small news website denies Google Bot crawling, it'll disappear from Google and essentially it'll disappear from the Internet. People do a great lengths to appease the Google Crawler.

If some huge news website demands fees from Google, it might work, I guess. But I'm not sure that it would work even for BBC or CNN.

jimbohn · 8h ago
I agree about the leverage and small website reasoning, definitely some game-theory related thinking is needed to get something like this right. But it does feel like this enables the "unionization" of websites against scraping giants, google is in an especially interesting position because, as you mentioned, could blackmail you into scraping in exchange for indexing.
ipaddr · 7h ago
If its a smaller news site they have already de-ranked them, and used their content for AI answers
ethbr1 · 9h ago
It'd be a fitting solution if news closed the loop, crawled Google et al. to see if any of their content showed up there, then repriced future cotent higher for any search engines that reproduced content via genai.
figassis · 10h ago
More publishers will start blocking google bots as well, bc google is already killing their revenue with AI results.
adjfasn47573 · 3h ago
I see most people stating that the internet as we know it could be gone because of AI.

I’m asking you: Why not? The internet is not even a typical human lifespan old. It’s crazy young on a large scale. Why would anyone assume that it will (and has to) stay the way it is today?

There are so many downsides of the current web. Slob everywhere (even long before AI) because of all sorts of people trying to exploit it for money.

I welcome a change. An internet with less ads, more genuine information. If AI will lead to this next phase of the internet, so be it. And this phase won’t be the last either.

isodev · 3h ago
> all sorts of people trying to exploit it for money

Because they could. In AI-first web, people can't really do anything about anything - only those in control of training the handful of "big popular AI models" are the gatekeepers of all knowledge.

> with less ads, more genuine information

That's orthogonal to AI. Models are already being trained to favour certain products/services and they already (re)produce factually incorrect information with no way to verify or correct them.

NitpickLawyer · 2h ago
> only those in control of training the handful of "big popular AI models" are the gatekeepers of all knowledge.

I think that's certainly the case now, and it will be for a while, but slowly we're getting closer to that "AI personal assistant" sci-fi inspired future, where everything runs on "your" infra and gathers data / answers questions locally. You'd still need "raw" data access for that. A way to micro-pay for that would certainly help, imo.

c4wrd · 3h ago
You're missing the bigger picture. It isn't free to put content on the Internet. At a bare minimum, you have infrastructure and bandwidth costs. In many cases, a goal someone may have is that if they publish content on the internet, they will attract people to return for more of the content they produce. Google acted as a broker, helping facilitate interactions between producers and consumers. Consumers would supply a query they want an answer to, and a producer would provide an answer or facilitate a space for the answers to be found (in the recent era, replace answer with product or store-front).

There was a mostly healthy interaction between the producers and consumers (I won't die on this hill; I understand the challenges of SEO optimization and an advertisement-laden internet). With AI, Google is taking on the roles of both broker and provider. It aims to collect everyone's data and use it as its own authoritative answer without any attribution to the source (or traffic back to the original source at all!).

In this new model, I am not incentivized to produce content on the internet, I am incentivized to simply sell my data to Google (or other centralized AI company) and that's it.

A clearer picture to help you understand what's going on: the internet of the past few decades was a bazaar marketplace. Every corner featured different shops with distinct artistic styles, showcasing a great deal of diversity. It was teeming with life. If you managed your storefront well, people would come back and you could grow. In this new era, we are moving to a centralized, top-down enterprise. Diversity of content and so many other important attributes (ethos, innovation, aestheticism) go out of the window.

haiku2077 · 3h ago
> You're missing the bigger picture. It isn't free to put content on the Internet. At a bare minimum, you have infrastructure and bandwidth costs.

While it technically isn't free, the cost is virtually zero for text and low-volume images these days. I run a few different websites for literally $0.

(Video and high-volume images are another story of course)

jorvi · 3h ago
> A clearer picture to help you understand what's going on: the internet of the past few decades was a bazaar marketplace.

That internet died almost two decades ago. Not sure what you're talking about.

MisterTea · 2h ago
The web died. The internet is still a functional global IP network. For now.
sc68cal · 2h ago
> An internet with less ads, more genuine information. If AI will lead to this next phase of the internet

How is AI supposed to create an internet "with more genuine information", based on what we have seen so far? These two statements appear to be mutually exclusive.

ASalazarMX · 2h ago
If I understand correctly, it will be not by creating a new iteration, but by destroying the current one.
sc68cal · 1h ago
We are in agreement that AI will destroy the current one. I don't see how the new iteration that AI would produce would have "more genuine information" seeing as how LLMs are just predicting what word follows the previous word. How is that genuine?
dogleash · 2h ago
I agree with the premise about impermanence. But moving in the direction of "less ads, more genuine" is comical if not tied to the userbase completely falling out and most never coming back.
nitwit005 · 3h ago
They aren't assuming it'd never change. They're upset at it getting worse. Things getting worse is generally what makes people unhappy.
reverendsteveii · 2h ago
this. it's changed several times over its lifetime and every change until recently has made it a better thing for the average person to use. We're out of the discovery phase and into the encirclement and exploitation phase.
saddlerustle · 7h ago
This ends up being pretty bad for competition because it does not block the largest AI scraper of them all: Googlebot.
dabbz · 5h ago
The more I read this, the more I feel like web attestation is going to be suggested as a way to prove AI bots vs humans.

They've been trying to push this through for a while now (to some moderate success). This may be the final push they are looking for to get it more thoroughly integrated in the web as a whole.

I know it wasn't mentioned anywhere here, but it's the silent part that fits this puzzle piece really well.

suyash · 10h ago
Nice to see someone addressing this annoying problem, I'm seeing first hand bot traffic go up as they are just gobbling up data. However instead of relying on Cloudflare, it would be better to have a open source protocol that handles permission and payment for crawlers/scraper.
bgwalter · 9h ago
If you don't want payment, there is:

https://anubis.techaro.lol/

Used by https://gcc.gnu.org/bugzilla/ for example. It is less annoying than CAPTCHA/Turnstile/whatever because the proof of work runs automatically.

marginalia_nu · 9h ago
Sadly they seem to be getting through this one lately. Had a scraper hitting me with 80 qps punching straight through Anubis. Had to set up a global rate limit that browned out the functionality they were interested in[1] in excessive load.

[1] This form https://marginalia-search.com/site/news.ycombinator.com

xena · 5h ago
Probably headless chrome. I'm going to investigate.
marginalia_nu · 1h ago
They were hitting me at an incredible rate sustained load for like a week, seems like a lot of resources if it is indeed headless chrome.

Is there anything I can extract, fingerprint wise, if they come back to work out what's going on?

gen6acd60af · 9h ago
See also (AFAIK most of these support JSless challenges out of the box): haproxy-protection, go-away, anticrawl
xena · 8h ago
rswail · 10h ago
The protocol that Cloudflare are proposing could be implemented by anyone. There will need to be ways for crawlers to register and pay.

CF is acting as the merchant of record, so they will be the ones billing, it's unclear what cut of the price they will take (if any) or if they will include it in their bundled services.

This should be expanded to allow for:

* micropayments and subscriptions

* integration with the browser UI/UX

* multiple currencies

* implementation of multiple payment systems, including national instant settlement systems like UPI, NPP, FedNow etc.

Leynos · 9h ago
Is this companies collecting data for model training, or is it agentic tools operating on behalf of users?
Melonai · 8h ago
I think in the grand scheme of things barely anyone uses agents (as of now) to crawl sites quickly apart from maybe a quick Google search or two. At least that's been my observation of my non-technical field friends using LLMs.

From what it looks like in the web logs it is in fact the same few AI company web crawlers constantly crawling and recrawling the same URLs over and over, presumably to get even the slightest advantage over each other, as they are definitely in the zero-sum mindset currently.

xena · 9h ago
Whatever it is, I've seen the commons abuse Gitlab servers so hard they peg 64 high wattage server cores 24/7. Installing mitigations cut their power bill in half.
1dom · 8h ago
I really like the idea that crawlers who are profiting should have to pay content owners/creators per crawl.

On principal though, I think Cloudflare doing this is just one more thing to create the perception that you can't put something on the internet unless it's through Cloudflare. This harms a transparent and decentralised web and makes selfhosting seem even less appealing to those who don't know any better.

This should be implemented as a web protocol with crypto though so anyone can charge bots without having to be Cloudflare fronted. Not really a fanboi of 99% of crypto stuff, but IMO, a purely technical, open and decentralised solution to this sort of problem was the crypto dream.

We can all guess the people who will make the most money off this, and one of them is Cloudflare. A bunch of the other winners probably also run some of the more aggressive crawlers.

imglorp · 8h ago
Yes, right, it should be an open protocol so any CDN or content provider can use it the same way. Hopefully it becomes a part of popular web servers so little guys can play along without a CDN.

It needn't be crypto, but would be convenient. Lacking that, it would need some unforgeable presentation of identity that could be connected to a bank account.

I shed not one tear for the crawlers - they had their chance to respect robots.txt on the honor system. Now we force them.

1dom · 7h ago
I can't work out how an open protocol implementation of this could work without crypto: ultimately if it's just fiat, a business entity needs to be the payment processor who aggregates microtransactions and pays them out to content owners, this is the role cloudflare is playing.

The problem is microtransactions are not feasible in fiat, and to remove the aggregator role like Cloudflare means a huge amount of microtransactions from each potential crawler to each content owner. That's just too much expensive work compared to the current position.

I agree though, I shed no tears for crawlers, but hopefully we're beyond the naivety of honour systems - again, the thing crypto was supposed to be solving.

Forcing big evil crawler entities to bend the knee by hiding behind big evil CDN entities feels silly though.

9283409232 · 7h ago
Brave has been trying and failing to get micropayments using crypto for years.
thelastkek · 6h ago
I almost guarantee Cloudflare will give this feature away for free for all tiers (Free plan included). I doubt they will make a penny off this. If you knew the company history and culture, youd foresee that too.
vbezhenar · 8h ago
> This should be implemented as a web protocol with crypto though so anyone can charge bots without having to be Cloudflare fronted. Not really a fanboi of 99% of crypto stuff, but IMO, a purely technical, open and decentralised solution to this sort of problem was the crypto dream.

It's not just about payment. It's about refusing to serve content to bots, unless they paid. It might be hard to implement without Cloudflare, when bot developers specifically target your website.

The whole point of Cloudflare is to let them decide whether it's bot or user that hits your website. It is complicated task.

Unless you want to force all users to pay, both humans and bots.

greatgib · 10h ago
In theory, why not, in practice welcome to the world where neutrality of internet explode...

Soon they could decide if your requests come from a specific company IP or networks, because you look suspicious...

In addition, bot fighting was never supposed to be about blocking automatic users but about blocking abusers, like spammers and co. So now it means that bad actor can have a free pass if they pay (with stolen credit cards...)

What I think would have been more fair is to propose rate limiting that would apply the same to everyone, so website should be reasonable in the limit they set for the normal users to not be annoyed. And then, you could pay to be able to have higher rate limit to ressources. That will compensate for the incurred cost to the infrastructure and the website owner. With that cloudflare could be in a good position to controle the rate limit, negotiate and collect payments to give it to the website owners.

nottorp · 8h ago
> Soon they could decide if your requests come from a specific company IP or networks, because you look suspicious...

They already are. You probably can't browse half the internet without Cloudflare's approval.

koolba · 10h ago
> In theory, why not, in practice welcome to the world where neutrality of internet explode...

Anybody that has the sense^Wgall to clear their cookies regularly lives in this world as you get that CloudFlare gate keeping for just about every site you visit.

baq · 10h ago
Cloudflare toll as a service. brb, setting orders to buy $NET.
ryao · 5h ago
What happens when Google starts using the data scrapped by its search engine crawlers for AI training? What prevents another crawler from impersonating one that is part of this program and getting someone else to pay? What happens when people start using headless browsers as crawlers and they are undetectable?
johnnyApplePRNG · 7h ago
They're going to need to deal with camoufox being easily able to circumvent their bot detection before they start charging for this, imho.
cedws · 9h ago
Can someone explain the payment headers part? Why not just have a header called X-Crawl-Key or something and intercept that header to figure out who to charge for the request?
krab · 7h ago
They have some headers for authentication. The payment part is for the price negotiation. The headers tell you that Cloudflare wants to charge you for this particular content and you tell CF that you're OK with being charhed up to $AMOUNT.
nialse · 7h ago
Although Cloudflare CEO Matthew Prince pre-launched their new offering with a compelling speech and numbers to boot, the mechanics does not add up. There is an assumption that AI companies need to scrape the web for content. This is certainly true for new AI companies and new content, but the vast majority of scraping useful content has already been completed. In addition new content will tend to be AI generated itself, which might not help training, and in the US training on purchased content has been deemed fair use recently.

What problem is being solved? The perceived issues are twofold, increasing crawling by AI scraping bots is causing traffic and thus an additional cost, and content creators lack compensation for their work in terms of money or notoriety (according to Matt). Cloudflare obviously have traditionally focused on the first, and needing to grow they see the potential in being a middle man in the second.

Where does this get us? Will Cloudflares service lower traffic volumes not generating revenue? Absolutely. Use of the service will be perceived as a success based on this metric and the revenue generating traffic will stay on similar and higher levels initially. Then, if the content indexed becomes more and more stale, as AI companies may or may not be willing to pay the associated costs, revenues will slide long term. Content creators seeking fame or fortune may then seek other avenues to promote and distribute their content as they perceive the alternatives as better.

The sole hope for Cloudflare is that a couple of the large AI outlets "play ball", and make the payed for indexed content available based on subscription fees or, god forbid, ads. However, then they might would want their users to be able to access the full contents guarded by other paywalls, and not only previews offered.

One would hope that this would lead to a future where creative humans are compensated more for their cognitive work. Unfortunately, with the trajectory we're on, that is a select few as the marginal cost of content is quickly approaching zero.

https://x.com/carlhendy/status/1938465616442306871

phillipcarter · 7h ago
> This is certainly true for new AI companies and new content, but the vast majority of scraping useful content has already been completed.

For training a base model, yes, but there's a big category of AI use case: search engine. Those invocations of the model involve web searches, often during reasoning steps, and they will absolutely scrape for content.

nialse · 7h ago
Agreed. The question is if new content is valuable enough? Or, will we see other sources rise to the occasion? Meta, Google, X and ByteDance at least have other sources of current content which they may start to promote "for visibility". If these sources will be sufficient for the reasoning steps is uncertain though.
tzury · 5h ago
Seems like Google got a pass, since their search engine activity is not included. Once they cached the page, their AI can "crawl" internally.

(Same applies to Bing most likely)

yonran · 4h ago
Why do AI agents need to scrape so often, vs. aggressively caching or using archive.org or their own crawls of the internet?
krunck · 7h ago
First a paywall for AI. Then a paywall for people - which is no different from an internet user license as the payment methods allowed would not be anonymous.
rralian · 4h ago
My gut reactions…

- I agree that something like this is necessary or the whole model of the internet will be broken, like Matthew Prince [explained in this video](https://www.youtube.com/watch?v=H5C9EL3C82Y).

- Their approach seems very imperfect, but I understand that you have to start somewhere.

- They are paying per crawl… but in fairness it should really be per usage. It’s like paying music artists once when they upload to Spotify rather than per-play -- even though one artist gets zero plays and another gets ten million. Sure, the idea is crawlers will bid more for the popular content author, but what if a nobody author has a one-hit-wonder piece of content. They’ll still just get a couple bips per crawl and then the cat is out of the bag.

- One solution to this would be requiring a GDPR-style forget mechanism, where the author is granting a limited-duration license for the content (say… one week), after which it must be deleted and re-licensed. This would be a huge fix for the whole thing… and the more I think about it the more I think it’s essential for this to work.

- The auction mechanics are biased to the crawler… if there is a spread between artist price and crawler max price, then the crawler pays the lower price set by the artist. It should be the average.

- They will need to provide content authors with analytics about the pricing mechanics for the bids the crawlers are making.

- If this whole thing works, then products that optimize bid mechanics on behalf of authors will be a big growth industry.

- If Cloudflare are setting themselves up as the clearing mechanism for payments, that’s far too much power and profit for one company. It’s even worse than the Google monopoly. Somehow the payment mechanics need to be democratized.

a_c · 10h ago
Use the very fund to feed generated contents to LLM crawler is the only right move
aspenmayer · 10h ago
Related from TC:

> Cloudflare launches a marketplace that lets websites charge AI bots for scraping

https://techcrunch.com/2025/07/01/cloudflare-launches-a-mark...

https://archive.is/6UDUv

mhandley · 8h ago
That sounds reasonable for access to actual content, but it produces a huge new incentive to constantly produce vast amounts of AI-generated slop served via Cloudflare. Is there a way to disincentivize this?
yen223 · 8h ago
I presume the onus will now be on the AI scrapers to decide whether that AI-slop site is worth paying for. How they will figure this out will be interesting to see.
samrus · 8h ago
Thats a more general problem. As content gets cheaper to produce with AI, how do consumers discriminate between good content and slop. We already have this problem with youtube and twitter and reddit

Its interesting that the AI companies will now be on the other end of this issue

Its_Padar · 9h ago
This pretty much solves the problem of too many bots, but only in a way that works with Cloudflare and does not help the rest of the web. They don't mention any possibility of specifying a different platform to route payments through for instance.
ethbr1 · 7h ago
Cloudflare published details of the prototype implementation, so there's no reason if it takes off then other CDNs and hosts can't implement the same 402 protocol.

There's literally nothing Cloudflare-specific about this.

yodon · 10h ago
Currently in private beta
jgrahamc · 10h ago
https://techcrunch.com/2025/07/01/cloudflare-launches-a-mark...

"Several large publishers, including Conde Nast, TIME, The Associated Press, The Atlantic, ADWEEK, and Fortune, have signed on with Cloudflare to block AI crawlers by default in support of the company’s broader goal of a “permission-based approach to crawling.”"

aussieguy1234 · 6h ago
If cloudflare popularizes this type of pay per crawl setup, I'd expect to see an open source standard to be created for these types of internet payments.
RVuRnvbM2e · 5h ago
By Cloudflare inserting themselves as a "HTTP market maker", is this step one in the enshittification of the web?
ukd1 · 7h ago
meh. ads for ai content is really the answer -

e.g "OpenAI ads" content creator puts a tag on their page / set their domain - when the crawler sees it, display an ad pass on $ as usual.

adjfasn47573 · 3h ago
omg what are you, a sadist?
udev4096 · 6h ago
Clownflare strikes yet again, bloating the web one at a time!
delusional · 10h ago
We don't need another technical protocol. We need legislation.
bgwalter · 10h ago
"AI" is certainly creating problems that can be monetized. So humans get the Cloudflare captchas in order to access their own content at Stackoverflow and Mathoverflow, and "AI" crawlers get the data highway for a fee.

And all of this does not stop the incumbents who have already stolen everything.

jgrahamc · 10h ago
Cloudflare has not used CAPTCHAs since 2023: https://blog.cloudflare.com/turnstile-ga/
mejutoco · 10h ago
To save a visit: they use turnstile, a captcha replacement. The checkbox with verify you are a human. I would call that a captcha, but it is debatable if a non-puzzle check is.
djfivyvusn · 9h ago
If I say CloudFlare captcha and you know what I mean, does it really matter?
bgwalter · 1h ago
Turnstile still a click, and that click is already exploited for phishing (note that Turnstile is also called CAPTCHA on the following site, though you are technically correct):

https://www.techradar.com/pro/security/fake-cloudflare-captc...

SubzeroCarnage · 10h ago
Having to click "Verify you are a human" every single time is still awful.
jlokier · 4h ago
Especially when it asks you to do it again, and again, until you realise you're never going to be allowed to see the page.

I've had that happen a few times with sites behind Cloudflare.

crgwbr · 10h ago
All this is going to do is drive AI companies to mask their user agent to appear as a standard browser, resulting in a worse end state than we’re in now. It’s an exercise in futility.
AkshatM · 9h ago
The blog post covers this. The announcement also drops relying on spoofable user agents for crawler identification and requires crawlers to voluntarily identify themselves via RFC 9421 cryptographic message signatures to get access: https://blog.cloudflare.com/introducing-pay-per-crawl/#payme...

There are likely incentives for AI companies to try to simulate human users as much as possible, but the value proposition here is that CF is so good at identifying and stopping those that signing a request becomes the path of least resistance.

Disclosure: I am on the team that wrote the RFC 9421 message signature implementation at Cloudflare and its use in the pay per crawl project. A separate blog post went out here: https://blog.cloudflare.com/verified-bots-with-cryptography/

raesene9 · 10h ago
Potentially for smaller players but I'd guess that the larger players (OpenAI, Anthropic, etc) won't go down that line as it'd be pretty easy to spot at the volume they're crawling and a bad look for them when they inevitably get discovered.

Also, Cloudflare is in the position of being able to see a lot of traffic making it easier for them to spot that kind of masking activity.

kassner · 10h ago
Weren’t they already doing that for years (plus using residential proxies)?
areyourllySorry · 7h ago
if it's cheaper than the proxies they might switch!
sgent · 9h ago
This could get AI scrapers hit with a DMCA circumvention lawsuit, which is $2,500 / scrape + attorney fees of both sides if they lose.
odyssey7 · 10h ago
In theory, this presents a form of competition that should drive the tolls down to an equilibrium level. Though, theory doesn’t always play out perfectly in practice.
rkrisztian2 · 10h ago
I agree, it's only the big tech companies who do this AI crawling, and they will always have money for it. This paywall won't stop them.
some_furry · 10h ago
Yes, but I do feel this makes "theft" arguments stronger if they're deliberately evading the paywall, if you decided to be litigious about it.
delusional · 10h ago
This isn't the kind of problem that really ought to be solved through courts. It's obvious to anyone that this is a new kind of problem that no author of the current jurisprudence envisioned. We need new legislation to stop this kind of abuse of the commons.
soatok · 9h ago
I strongly agree with you, but I have no confidence in my country's current elected representatives to ever do anything good, so our hands are tied until we vote them out.
sofixa · 9h ago
Yes. It's always weird to me that people expect laws written over centuries, using precedents from even more centuries, to be able cover scenarios their authors couldn't have possibly imagined.

Civil law countries seem better at keeping their laws up to date with new threats whereas a few common law ones (most notably the US) really insist on digging through what an 18th century slave owner would have thought about e.g. AI.

PeterStuer · 9h ago
So:

1. Encourage fencing off everything by default to maximize need for bypass

2. Offer bypass through payment

3. Profit!

You wouldn't believe the number of public administrations with public information that have (mostly unwittingly) had some lazy contractor put Cloudflare in front of their entire site, blocking even their RSS feeds from M2M. Yes, you can send them mails and call and sometimes, if they even understand the problem, they will fix it after a few months just before the next cheapest contractor is hired and we start all over again.

Not saying Cloudflare is just an extortion racket, but it's getting closer by the day.

9283409232 · 7h ago
I don't trust Cloudflare but this is not a problem they created. They solved a real problem with DDoS protection in the beginning and now AI crawlers increasing server cost is not a negligible problem. The CEO of iFixit called out Anthropic publicly for hitting their site a million times in 24 hours to scrape it. We are passed the point of good faith action from these AI companies. They are adversarial and need to be treated as such.
PeterStuer · 3h ago
But they do create this problem. They could have specifically defaulted to not blanketing RSS feeds and other M2M specific pages. Instead, we are now in a situation where even daring to look at robots.txt can flag you as a bot.
johnklos · 2h ago
True. For all those people who want to make excuses for Cloudflare, it's an excellent reminder that they've known about this problem for years and they still haven't fixed it.

Are they inept? Or do they really only care about things that bring them profit and that normalize their marginalization of non-paying groups? Which explanation makes the most sense?

OtherShrezzing · 9h ago
Someone should use this to create a new browser. A human user drops $100 into the browser, and each website offers a per-page-view rate, gradually deducted from the $100. In exchange the user doesn't have to suffer through advertisements.
kevlened · 9h ago
Google had an experiment called Google Contributor where you could buy all your own ads. This effectively had the experience you're describing (prepay and get no ads until it runs out). They tried it twice, so someone wanted it to work. I was always curious why they shuttered it.
Leynos · 9h ago
Tiny selection of web sites, restricted to the US, absolutely no marketing.
mattlondon · 7h ago
Probably because people didn't want to pay. It's easy and cheap to say "I'd pay X to access website without ads!" ...but when it came to it and people had that option, essentially no-one did it.
imiric · 6h ago
This is because most of the time paying is not an option. And even when it is, there is a lot of friction to actually do that, even with streamlined payment services such as Stripe. The advertising business model and technology that powers it is so well established that a "free" service is much easier to manage for publishers and to access for consumers.

There's also the psychological aspect. People are used to advertising in every other form of media, so seeing it online is acceptable. People expect online services to be "free", and few really understand the business transaction they're a part of, or the repercussions of it. Even when they do, many are willing to make that transaction because they value the service more than what they're giving up to access it, and they have no other choice.

So it ultimately boils down to offering the choice to pay with currency, and making it frictionless for both consumers and publishers to use. And educating consumers about the real cost of advertising.

The unfortunate reality is that advertising has become so profitable that in order for the payment system to work, companies would have to price their services higher than any consumer would be willing to pay for them. Or they would have to settle for lower profits, which no company would actually do. This is why you see that even when a service has a payment option, they still inevitably choose to _also_ show ads. Advertising money is simply irresistible to most people, and few have the moral backbone to resist it.

kevlened · 4h ago
I contributed ~$30 during both experiments. The most interesting aspect was seeing which sites consumed most of the spend. It also felt good to see the contribution to smaller sites.

Paying for my own ads felt similar to shopping at a local bookstore: I paid extra for the culture I wanted to see. There's a market for it, but, you're right, it probably wasn't big enough to justify its existence at Google.

gessha · 5h ago
<GoogleRant>

Does Google need a reason to shutter something?

</GoogleRant>

mdrzn · 9h ago
And we are back to: how much would you pay for a website before you access said website for the first time?
sofixa · 9h ago
The Web Monetisation protocol solved this by doing a prorata based on how long you spent on the page.
mdrzn · 9h ago
What if I save the page locally on my machine? Or archive it?
nottorp · 8h ago
What if my phone rings and I forget to close the page? Or leave it open to read it later?

Plus as one of the parent comments said, I am not paying before I get an idea what I'm paying for.

sofixa · 5h ago
> Plus as one of the parent comments said, I am not paying before I get an idea what I'm paying for.

The logic was, you aren't paying to a website. You're paying to a broker that distributes how much you've paid to all websites you've visited that have opted in, prorated based on how much time you spent on them.

nottorp · 5h ago
> You're paying to a broker that distributes how much you've paid to all websites you've visited that have opted in

That does make sense. Although that broker is going to push for a subscription.

> prorated based on how much time you spent on them

If it's text, I wouldn't mind, I'm a very fast reader. But that's incentive to lengthen the "content" or maybe go all video, insert animated page transitions and stuff like that.

Slow readers and people on slow internet will be penalized.

imiric · 9h ago
Brave already does this with BAT[1].

It's a shame that it will never gain traction. Partly because of the cryptocurrency stigma, some missteps by Brave Inc. that have tarnished its credibility, and partly because of adtech's tight grip on the web.

[1]: https://basicattentiontoken.org/

its-kostya · 9h ago
A problem this doesn't solve is that people are trusting LLM summaries so much that they aren't even visiting the pages linked as sources. The pages-scraped/visitor is something like 1,500 pages scraped to 1 visitor. Compare that to years ago when google advertised that for every 2 pages they scrape, you get 1 visitor. If no one read content, people aren't incentivised to write content, publishers and bloggers alike.
mariusor · 9h ago
Everyone seems to hate on Brave for trying something like this. Granted they use shady cryptocurrency instead of a one time fee, or subscription model, but what the hell...
carlosjobim · 8h ago
I've been suggesting exactly this for some time. A button in the browser to pay the asked value to view a page, and if the page is free to view the button instead turns into a donation button for voluntary donations.
pu_pe · 9h ago
> The true potential of pay per crawl may emerge in an agentic world. What if an agentic paywall could operate entirely programmatically? Imagine asking your favorite deep research program to help you synthesize the latest cancer research or a legal brief, or just help you find the best restaurant in Soho — and then giving that agent a budget to spend to acquire the best and most relevant content.

So the vision is a paywall around the whole internet. Content aggregators would charge AI companies to provide data relevant to specific queries. Sounds like a nightmare to me.

No comments yet

hubraumhugo · 6h ago
We all agree that AI crawlers are a big issue as they don't respect any established best practices, but we rarely talk about the path forward. Scraping has been around for as long as the internet, and it was mostly fine. There are many very legitimate use cases for browser automation and data extraction (I work in this space).

So how will Cloudflare detect bots and get them to pay? And how many humans and legitimate bots will get blocked as a side effect? We're somehow still stuck with CAPTCHAs, a 25 years old concept that wastes millions of human hours and billions in infra costs [0].

How can we enable beneficial automation while protecting against abusive AI crawlers?

[0] https://arxiv.org/abs/2311.10911

No comments yet

some_furry · 10h ago
I would totally do this.

"Read my blog for free, or pay $25/page for your AI to read it for you." This is praxis.

Enshittify the enshittification machine.

We should also throw ads in there, via a deliberate prompt injection that the AI companies expose through an API. I totally won't misuse it ;)

GaggiX · 10h ago
>pay $25/page for your AI

A residential proxy is way cheaper for scraping.

some_furry · 10h ago
Sure, and suing the AI companies that use residential proxies to get out of paying is expensive, but so are my other hobbies. Hilarity could ensue.
GaggiX · 10h ago
Well good luck with your endeavor in that case
aspenmayer · 10h ago
I mean, I wouldn't care who referred the paying user. I would optimize my blog to serve free and paying users to the best of my ability. I would hope that I could do that but I don't know what you are trying to do with your blog. Most blogs could probably be self-hosted with some caching and static layout where possible to perhaps avoid needing to use Cloudflare. I guess you already have to be using CF to have access to these paying AI crawlers.

Do you think that there is $25 of value in the creation of your blog, to say nothing of value that AI may be able to extract from it? (I'm speaking hypothetically, as I haven't looked at your profile to see if you link your blog, but I will do so now.)

Edit: I have checked, and I've read your blog before. I think the answer to the question depends on who is asking but I don't know how you feel about the matter. I think asking for folks to pay for free things is a different value proposition than a pay-per-use fee, so the economics are different. You're also offering something different when you give away a blog and monetize access to a community or something similar, which is different still to accepting donations and so on. I don't know what you do for work or if you do your blog full time, but I think it's cool that you make it all the same.

JimDabell · 9h ago
> Do you think that there is $25 of value in the creation of your blog, to say nothing of value that AI may be able to extract from it?

I think the more pertinent question is: Can an AI company determine the value of that content automatically without seeing it? Because if they can’t, why would they pay for it?

soatok · 9h ago
If they won't pay for it, they can also kindly fuck off and not crawl my blog. Both are fine outcomes for me.
aspenmayer · 9h ago
If they have lots of reasons to believe it could be relevant due to requests for content by name or reference, as well as citations and other knowledge graph links, I think they could run some kind of A/B test to see what the market will bear based on what they estimate the crawls per billing period would be.
some_furry · 10h ago
I pay $300/year for the privilege of being able to write what I want without the pressure to monetize or surveil my readers.

Some of my blog posts are linked by programming language docs for cryptography. Others have helped queer folks transition to a higher-paying tech career.

It's difficult to quantify either of those things in a dollar value. I've opted to not ever do so. But if I can make the AI slop machines pay to view my furry/tech ramblings, I will do enthusiastically.

arccy · 8h ago
that $300 sounds more like a personal choice when you can do it for ~free, e.g. with GitHub pages or similar.
aspenmayer · 10h ago
Since I've got you here, are you going to DEF CON?
some_furry · 10h ago
Yup. I'm planning to debut my fursuit there too.
aspenmayer · 10h ago
Awesome! I hope to see you there.
yantramanav · 10h ago
While this is a neat idea, how does it negate all the data theft being done by the bots so far?

I recently saw a research paper behind a paywall but ChatGPT readily gave me a detailed summary of that article. I’m afraid the cat's out of the bag now.

teruakohatu · 10h ago
All the LLMs are being trained on LibGen/Anna’s Archive so it’s not in the least surprising they can tell you about papers behind paywalls.

But I don’t think they are creating accounts to scrape paywalled data from the original source itself.

orliesaurus · 3h ago
There’s a tiny but important difference between scraping and getting work done.

Scraping is associate with mindless extraction. Like a vacuum cleaner sucking-in any data without context, permission, or contributing value back.

On the other hand AI agents aren’t here to scrape for the sake of it - I have seen it first hand. They are here to get work done, mostly researching, summarizing, assisting, building new products. You could argue this data is then use to train further a model, you would probably argue correctly, but that’s a topic for another day.

I implemented the poor man's version demo of what a similar concept to this could be like: http://github.com/toolhouseai/fastlane-demo

kumarski · 2h ago
Funny finding you here. ;)

Hey Orlie.