BBC threatens AI firm with legal action over unauthorised content use

52 ColinWright 18 6/20/2025, 12:36:44 PM bbc.co.uk ↗

Comments (18)

simonw · 6h ago
It looks to me like this is mainly about RAG - Perplexity answers user questions by running searches and then displaying content from those searches to users, and the BBC are arguing that this content display violates their copyright.

Unsurprisingly this article confuses the issue somewhat by also talking about training models on content. I understand why that's in there - it's a hot topic, especially in the UK right now - but I don't think it's directly relevant to this complaint.

The note about robots.txt is interesting - "The BBC said in its letter that while it disallowed two of Perplexity's crawlers, the company "is clearly not respecting robots.txt".

Perplexity describe their user-agents here: https://docs.perplexity.ai/guides/bots

I had a look at https://www.bbc.com/robots.txt and it does indeed block both PerplexityBot ("designed to surface and link websites in search results on Perplexity" - I think that's their search index crawler) and Perplexity-User ("When users ask Perplexity a question, it might visit a web page to help provide an accurate answer and include a link to the page in its response").

But... I checked the Internet Archive for a random earlier date - Feb 2025 - https://web.archive.org/web/20250208052005/https://www.bbc.c... - and back then the BBC were blocking PerplexityBot but not Perplexity-User.

hadrien01 · 6h ago
They also write this:

> Since a user requested the fetch, this fetcher generally ignores robots.txt rules.

bitpush · 3h ago
> Since a user requested the fetch, this fetcher generally ignores robots.txt rules.

Normally the expecation is that the user-agent faithfully presents the content it fetched.

If I make a browser that fetches bbc.com, and strips away ads and presented it to users - I would expect BBC to not like it and block the user-agent from accessing it. It isnt a robots.txt thing. It is a user-agent thing.

simonw · 5h ago
Oh wow, I missed that! That's from the docs for that Perplexity‑User user-agent, at which point presumably there's no point in listing that in robots.txt at all?
dabeeeenster · 5h ago
I mean, that's just not true.
esskay · 4h ago
Which part? It's widely established and known that many AI crawlers are ignoring the robots.txt file, perplexity being one of them [1]

[1]https://www.tomshardware.com/tech-industry/artificial-intell...

whilenot-dev · 5h ago
For what its worth, this statement here regarding Perplexity-User:

> Since a user requested the fetch, this fetcher generally ignores robots.txt rules.

...has been added sometime between 30.01.2025[0] and 07.02.2025[1], and makes it sound like robots.txt was not respected by that bot anyways.

[0]: https://web.archive.org/web/20250130164401/https://docs.perp...

[1]: https://web.archive.org/web/20250207113929/https://docs.perp...

simonw · 5h ago
Great catch there.
seydor · 5h ago
> In a statement, Perplexity said: "The BBC's claims are just one more part of the overwhelming evidence that the BBC will do anything to preserve Google's illegal monopoly."

Unless perplexity has a way to indirectly pay writers the way google does, this is very rich

> four popular AI chatbots - including Perplexity AI - were inaccurately summarising news stories, including some BBC content.

One of the interesting things about the failures of LLMs is that news sources have become more concise and more authoritative. Even google fails to get facts right with its AI summaries, so one is compelled even more to go read the website instead. And I'm not sure if LLMs will ever be able to grasp true from lies.

fcatalan · 5h ago
To be honest not visiting some websites is one of my main uses of Perplexity.

For example I like to watch F1 and I like to know the times for all sessions in my timezone during the weekend.

It's surprisingly hard to find this information, because the Google search is SEOed to hell and back by sites that hide the information behind endless articles full of irrelevant AI slop and 2 million intrusive ads, and that's if they have it right or at all.

Perplexity wades through all that shit, gives me a neatly formatted table and has never been wrong so far.

So I can see where the BBC is coming from but I also don't really want them to win.

bitpush · 3h ago
> To be honest not visiting some websites is one of my main uses of Perplexity.

I use it the same way as well, but everytime I use it .. I feel icky. A sense of impending doom.

Imagine a book summaries service, that helped users not buy any books ever. What is the incentive for a writer to write a book, when they know that in ~mins, the summary of the work will be available on a different site.

News sites are unique in that the value they provide, for the most part, is the realtime-ness of it. BBC reporting on latest in London is the work of soo many journalists and if Perplexity sidesteps that - BBC has no incentive (and in the future, money) to do that work. It kills BBC, and it ultimately kilss Perplexity.

So yes, Perplexity is playing a very dangerous short term game, and BBC is right in suing them.

> BBC is coming from but I also don't really want them to win.

If BBC doesnt win, BBC (and other sites that "produce" information) dies which kills Perplexity.

esskay · 6h ago
> In a statement, Perplexity said: "The BBC's claims are just one more part of the overwhelming evidence that the BBC will do anything to preserve Google's illegal monopoly."

That's got to be the most delusional response they could've given. It's not BBC or any other news publishers job to preserve Google's monopoly. The comparison would only even work if Google was replacing a link to a BBC article in the search results with a direct copy of said article on the Google search results page.

oneeyedpigeon · 6h ago
I'd love to see some—any—of this "overwhelming evidence". I suspect it does not exist. I'd also love to ask Perplexity why they think the BBC would have any kind of bias toward Google, it just doesn't make any sense.
randall · 6h ago
this is the most non sequitur press statement ever.
josefritzishere · 6h ago
Good. I hope BBC gets a historically large judgement and Google has to learn a valulable lesson.
bitpush · 3h ago
How's BBC lawsuit against Perplexity affect Google? Did you not read the article?
riskable · 5h ago
How is Perplexity different from running a Jupyter Notebook or anything, really that lets you download a web page programmatically? I can spin up an AWS instance, login then run `python` and scrape the BBC's content as much as I want. Why aren't they suing Amazon (and every other company that lets you download stuff via their systems) for providing the same functionality?

A very old argument: If you don't want people scraping or downloading your content don't put it on the (public) Internet!

Imagine we had LLM-like functionality in the 1980s: Sony announces a new VCR that can read a recorded news show and print out a summary on a connected Imagewriter II. People start using it to summarize the publicly-broadcast BBC news programs.

Today's scenario would be like the BBC sues Sony for providing that functionality.

ethbr1 · 4h ago
Because copyright is intrinsically linked to scale.

1000000x'ing fair use... might no longer be fair use.

The balances between society and copyright need to change when scale changes drastically.

To address the elephant in the room -- what happens when there are only leachers and no sources, because we've let them hijack first-party news revenue without creating a replacement?