U.S. government takes 10% stake in Intel (cnbc.com)
605 points by givemeethekeys 7d ago 718 comments
Claude Sonnet will ship in Xcode (developer.apple.com)
473 points by zora_goron 23h ago 387 comments
How to stop Google from AI-summarising your website
58 teruza 50 8/29/2025, 8:28:35 PM teruza.com ↗
It's akin to me putting up billboards and stickers around town and then demanding to decide who gets to look at them.
Same thing with online publishers. If they want to control who uses their content and how, there's a tried and true solution and it's spelled "paywall".
Part of the reason for writing is to cultivate an audience, to bring like-minded people together.
Letting a middleman wedge itself between you and your reader damages the ability and does NOT benefit the writer. If the writer wanted an LLM summary, they always have the option to generate it themselves. But y'know what? Most writers don't. Because they don't want LLM summaries.
---
Also, LLMs have been known to introduce biases into their output. Just yesterday somebody said they used an LLM for translation and it silently removed entire paragraphs because they triggered some filters. I for one don't want a machine which pretends to be impartial to pretend to "summarize" my opinions when in fact it's presenting a weaker version.
The best way to discredit an idea is not to argue against it, but to argue for it poorly.
Honestly, publishers should just allow it. If the concern is lost traffic, it could be worse — the “source” link in the summary is still above all the other results on the page. If the concern is misinformation, that’s another issue but could hopefully be solved by rewriting content, submitting accuracy reports, etc.
I do think Google needs to allow publishers to opt out of AI summary without also opting out of all “snippets” (although those have the same problem of cannibalizing clicks, so presumably if you’re worried about it for the AI summary then you should be worried about it for any other snippet too).
Or asking if you want to pay to remove false information that they generate which makes you look bad.
Header set X-Robots-Tag "noindex, nofollow, noarchive, nositelinkssearchbox, nosnippet, notranslate, noimageindex"
Of course, only the beeping Internet Archive totally ignored it and scraped my site. And now, despite me trying many times, they won't remove it.
It seems to mostly work, I also have Anubis in front of it now to keep the scrapers at bay.
(It's a personal diary website, started in 2000 before the term "blog" existed [EDIT: Not true - see below comment]. I know it's public content, I just don't want it searchable public)
In all honestly, if you're hosting it on the internet, why is this a problem? If you didn't want it to backed up, why is it publicly accessible at all? I'm glad the internet archive will keep hosting this content even when the original is long gone.
Let's say I'd read your website and wanted to look it up one day in the far future, only to find many years later the domain had expired, I'd be damn glad at least one organization had kept it readable.
Additionally, when I die, I want my website to go dark and that's that. It's a diary, it's very very mundane. My tech blog I post to, sure, I'm 200% happy to have that scraped/archived. My diary I keep very up-to-date offline copies of that my family have access to, should I tip over tomorrow.
I realise this goes against the usual Internet wisdom, and I'm sure there's more than one Chinese AI/bot out there that's scraped it and I have zero control over. But where I allegedly do have control, I'd like to exercise it. I don't think that's an unfair/ridiculous request.
>Good! It's literally the Internet Archive and you published it on the internet. That was your choice.
>As a general rule, people shouldn't get to remove things from the historical record.
>Sometimes we make exceptions for things that were unlawful to publish in the first place -- e.g. defamation, national secrets, certain types of obscene photos -- where there's a larger harm otherwise.
>But if you make someone public, you make it public. I'm sorry you seem to at least partially regret that decision, but as a general rule, it's bad for humanity to allow people to erase things from what are now historical records we want to preserve.
But it's my content - it's not your content. I don't regret my decision, anything I really don't want public is behind a login. The website is still there, still getting crawled.
What really upsets me the MOST though is IA won't even reply to my requests to tell me "We're not going to remove it" - your reply (I am assuming from your wording you have some relationship with them, apologies if that's not the case) is the only information I've got! (Thanks)
[Note reply was from user crazygringo but I can't find it now, almost like they... removed it? It was public though and I'm SURE they won't mind me archiving it here for them.]
So... you believe that your and IA's behavior is or is not okay? Because it's a touch odd to start playing the other side now.
Look at the reason, and get mad to the correct people.
It might be the archive themselves, but just be sure.
I still don't fathom why they just _ignore_ the request not to be scraped with the above headers. It's rude.
Why would you NOT want internet archive to scrape your website? (Im Clueless - thank you)
Yes I could password protect it (and any really personal content is locked behind being logged in, AI hasn't scraped that) but I _like_ being able to share links with people without having to also share passwords.
I realise the HN crowd is very much "More eyeballs are better for business" but this isn't business. This is a tiny, 5 hits a month (that's not me writing it) website.
They won an award for the paper, and the example they given was a "holiday" search, where a hotel inserted their name, and an airline company wedged themselves as the best way to go there.
If I can find it again, I'll print and stick its link all over walls to make sure everybody knows what Google is up to.
Edit: Found it!
[0]: https://research.google/blog/mechanism-design-for-large-lang...
IMO a LLM is just a superior technology to a search engine in that it can understand vague questions, collate information and translate from other languages. In a lot of cases what I want isn't to find a particular page but to obtain information, and a LLM gets closer to that ideal.
It's nowhere near perfect yet but I won't be surprised if search engines go extinct in a decade or so.
Google even put the AI snippet above their ads, so you know how bad it stings.
And technically speaking, this citation is the first link on the results page, so you “rank” higher than all the other results. But it does take two clicks to get to your page.
They should make the citations more prominent and use the page title as anchor text. And when there’s multiple citations, the side panel should be open by default or they should put all of the citations inline as prominent links with page titles as anchor text.
This feels like the wrong solution for wanting to be compensated for information.
I don't how what the solution is because one often doesn't know if the information is worth paying for until after viewing it.
Yeah, maybe some will want to only read the imdb plot summary of Lord of the Rings. I am not sure why any author would care about those people unless they are really desperate for clicks.
> and Reclaim Your Organic Traffic
Content:
> 1. Set Snippet Length to Zero with max-snippet:0
Sure, buddy, sure. Users are notorious for clicking a link in search result without description, right.