Show HN: Renamify – Case-aware search and replace for AI agents (docspring.github.io)

> Large language models generate text one word (token) at a time. Each word is assigned a probability score, based on how likely it is to be generated next. So for a sentence like “My favourite tropical fruits are mango and…”, the word “bananas” would have a higher probability score than the word “airplanes”.

> SynthID adjusts these probability scores to generate a watermark. It's not noticeable to the human eye, and doesn’t affect the quality of the output.

I think they need to be clearer about the constraints involved here. If I ask What is the capital of France? Just the answer, no extra information.” then there’s no room to vary the probability without harming the quality of the output. So clearly there is a lower bound beyond which this becomes ineffective. And presumably the longer the text, the more resilient it is to alterations. So what are the constraints?

I also think that this is self-interest dressed up as altruism. There’s always going to be generative AI that doesn’t include watermarks, so a watermarking scheme cannot tell you if something is genuine. It is, however, useful for determining that something came from a specific provider, which could be valuable to Google in all sorts of ways.

merelysounds · 45m ago

This might be enforced in some trivial way, e.g. by requiring AI models to answer with at least a sentence. The constraints may not be fully published and the obscurity might make it more efficient, if only temporarily.

Printer tracking dots[1] is one prior solution like this; annoying, largely unknown, workarounds exist, still - surprisingly efficient.

[1]: https://en.m.wikipedia.org/wiki/Printer_tracking_dots

wenbin · 37m ago

I really hope SynthID becomes a widely adopted standard - at the very least, Google should implement it across its own products like NotebookLM.

The problem is becoming urgent: more and more so-called “podcasts” are entirely fake, generated by NotebookLM and pushed to every major platform purely to farm backlinks and run blackhat SEO campaigns.

Beyond SynthID or similar watermarking standards, we also need models trained specifically [0] to detect AI-generated audio. Otherwise, the damage compounds - people might waste 30 minutes listening to a meaningless AI-generated podcast, or worse, absorb and believe misleading or outright harmful information.

[0] 15,000+ ai generated fake podcasts https://www.kaggle.com/datasets/listennotes/ai-generated-fak...

teiferer · 2h ago

Could anybody explain how this isn't easily circumvented by using a competitor's model?

Also, if everything in the future has some touch of AI inside, for example cameras using AI to slightly improve the perceived picture quality, then "made with AI" won't be a categorization that anybody lifts an eyebrow about.

michaelt · 9m ago

> Could anybody explain how this isn't easily circumvented by using a competitor's model?

If the problem is "kids are using AI to cheat on their schoolwork and it's bad PR / politicians want us to do something" then competitors' models aren't your problem.

On the other hand, if the problem is "social media is flooded with undetectable, super-realistic bots pushing zany, divisive political opinions, we need to save the free world from our own creation" then yes, your competitors' models very much are part of the problem too.

progval · 1h ago

By lobbying regulators to force your competitors to add watermarks too.

verisimi · 1h ago

If you see the mark, you'd know at least that you aren't dealing with a purely mechanic rendering of whatever-it-is.

dragonwriter · 2h ago

> Could anybody explain how this isn't easily circumvented by using a competitor's model?

Almost all the big hosted AI providers are publicly working on watermarking for at least media (text is more of a mixed bag); ultimately, its probably a regulatory play—the big providers expect that the combination of legitimate concerns and their own active fearmongering, combined with them demonstrating watermarking, will result in mandates for commercial AI generation services to include watermarking. This may even be part of the regulatory play to restrict availability and non-research use of open models.

mhl47 · 38m ago

Yes but isn't the cat out of the box already? Don't we have sufficiently strong local models that can be finetuned in various ways to rewrite text/alternate images and thus destroy possible watermarks.

Sure in some cases a model might do some astounding things that always shine through, but I guess the jury still out on these questions.

Oras · 1h ago

OpenAI has been doing something similar for generated images using C2PA [0]

It is easy to alter by just saving to a different format or basic cropping.

I would love to see how SynthID is fixing this issue.

https://help.openai.com/en/articles/8912793-c2pa-in-chatgpt-...

DrNosferatu · 45m ago

The first good use of blockchain comes to mind.

DrNosferatu · 46m ago

If I slightly edit plain text watermarked with it, will the watermark identification be robust?

9dev · 2h ago

You can never be sure something has been generated by a model embedding one of these anyway, so it’s pretty moot.

chii · 1h ago

i find the premise to be an invalid one personally - why is the property that a works from an AI model must be identified/identifiable?

HighGoldstein · 1h ago

Video evidence of you committing a crime, for example, should be identifiable as AI-generated.

chii · 1h ago

how do we currently deal with tampered video evidence today, before the advent of ai generated videos? Why cant same methods be used for an ai generated video?

egeozcan · 3h ago

I guess this is the start of a new arms race on making generated content pass these checks undetected and detecting them anyway.

dragonwriter · 2h ago

Its not really an arms race; any gen AI system that doesn't explicitly incorporate a watermarking tool like this won't be detectable by tools that read the watermarks.

There is a kind of arms race that has existed for a while for non-watermarked content, except that the detection tools are pretty much Magic 8-ball level of reliability, so there's not a lot of effort on the counter-detection side.

HighGoldstein · 1h ago

I wonder if, conversely, authentic media can be falsely watermarked as AI-generated.

NitpickLawyer · 44m ago

When chatgpt launched there was a rush of "solutions" to catch llm generated text. The problem was not their terrible accuracy, but their even more terrible false positive rates. The classic example was pasting the declaration of independence, and getting 100% AI generated. What's even more sad is that some of those solutions are still used today, and for a while they were used against students, with chilling repercussions for them.

notpushkin · 1h ago

For photos, I think the answer is yes. For texts, the wording will be changed when you watermark them, so I guess that’s a no.

peterkelly · 3h ago

Create the problem, sell the solution.

montag · 2h ago

"The watermarks are embedded across Google’s generative AI consumer products, and are imperceptible to humans."

I'd love to see the data behind this claim, especially on the audio side.

donperignon · 2h ago

Nah that’s a solved problem if you work on the frequency domain. Same for image. Text is the hard rock here.

donperignon · 2h ago

I am not sure that text watermarking will be accurate, I foresee plenty of false positives.

doawoo · 2h ago

the beginning of walled garden “AI” tools has been interesting to follow

pelasaco · 2h ago

looks like the same as anti-virus companies in the 80s? Write virus, Write anti-virus and profit!

R_Spaghetti · 58m ago

It only works across Google shit.

Show HN: Renamify – Case-aware search and replace for AI agents (docspring.github.io)

Built this app because I could never understand fancy restaurant menu words (foodielens.app)

Show HN: Datacmd – Terminal-native dashboards from CSV/API in one command (github.com)

Triangle Grids (kvachev.com)

AI web crawlers are destroying websites in their never-ending hunger for content (theregister.com)

Pyret: Is a programming language for programming education (pyret.org)

Astroterm: A Planetarium for Your Terminal (github.com)

Transform natural language descriptions into Tailwind CSS classes (github.com)

Show HN: Interactive bash session for coding agents (github.com)

Every Conspiracy Theory I Was Taught as a Fundamentalist Homeschooler [video] (youtube.com)

Are You Tired of AI?

Pearl – An Erlang lexer and syntax highlighter in Gleam (github.com)

The Burden of Responsibility (ides.dev)

Ask HN: What do you think about GFW?

Linus Torvalds Marks Bcachefs as Now "Externally Maintained" (phoronix.com)

The Future That Never Was (unpopularfront.news)

Irelander 'repeatedly punched in the face by police' during Berlin Gaza protest (independent.ie)

Ukraine blows up two Russian bridges using Moscow's own mines and $600 drones (cnn.com)

The primitive tortureboard: Untangling myths and mysteries of Dvorak and QWERTY (aresluna.org)

I Don't Believe in MCPs (old.reddit.com)

A16-FuseBypass: Debug Logic Enabled on Production Apple Silicon (github.com)

I built 59 open-source Claude Code subagents to supercharge software development (github.com)

A practical guide to debugging GitHub Actions (depot.dev)

Inapparent virus infections differentially affect honey bee flight (science.org)

An LLM-Proof Approach to Reinventing Captcha Systems (old.reddit.com)

Gall's Law (blog.prototypr.io)

Efficient Deep Learning Book (efficientdlbook.com)

The Dirty Secret of Coding Agents (medium.com)

What made the Amiga "Genlock-able"? (retrocomputing.stackexchange.com)

Situational Awareness: The Decade Ahead (2024) (situational-awareness.ai)

USBODE: Optical Drive Emulator for DOS and Newer PCs (retrorgb.com)

Historical Housing Prices Project (philadelphiafed.org)

KeyBee Android Keyboard (keybeekeyboard.com)

Looking for Affordable Alternatives to USB ISO Emulators Like iODD (yomotherboard.com)

Rust ints to Rust enums with less instructions (sailor.li)

Some clarifications and thoughts around "ChatGPT psychosis" (drtompollak.substack.com)

Milan's expat 'explosion' brings new buzz to Italy's financial centre (ft.com)

Show HN: Keeptalking (github.com)

I got people to pay me $50K in 3 days with NFTs (2021) (paulstamatiou.com)

U3 (Software) (en.wikipedia.org)

Weird but True: I^I Is a Real Number (medium.com)

USGS Streamgage Import (waysidemapping.org)

Culture Has No Name for This Cursed Vibe. It's Everywhere (news.artnet.com)

10-20x Faster LLVM -O0 Back-End – Code Generation (discourse.llvm.org)

Show HN: Design inspirations on an infinite wall of UI Components (shuffle.dev)

An Interview with Julio Barba (halide.cx)

A Deep Dive into the Wonderful World of SVG Displacement Filtering (2021) (smashingmagazine.com)

HumanLayer (github.com)

The First Karaoke Machine (spectrum.ieee.org)

The use of Claude Code in SciML repos (discourse.julialang.org)

SynthID – A tool to watermark and identify content generated through AI

Comments (28)