Differences in link hallucination and source comprehension across different LLM (mikecaulfield.substack.com)

> Distillation isn’t an uncommon practice, but OpenAI’s terms of service prohibit customers from using the company’s model outputs to build competing AI.

I have the absolute tiniest of violins for this given OpenAI's behaviour vs everyone else's terms of service.

sovietmudkipz · 23h ago

“Copyright must evolve into the 21century (…so that AI can legally steal everything produced by people”

And also “Don’t steal our AI!”

jsheard · 22h ago

The world is not prepared for the mental gymnastics that OpenAI/Google/etc will employ to defend their copyright if their big models ever get leaked.

bitpush · 21h ago

I see no evidence that Google is doing this. Any sources?

Zetaphor · 22h ago

I'm still unclear how they are able to claim this considering their raw thinking traces were never exposed to the end user, only summaries.

parineum · 1d ago

At this point, they all using each other because so much of the new content they are scraping for data is generated.

These models will converge and plateau because the datasets are only going to get worse as more of their content is incestuous.

jsheard · 22h ago

The default Llama 4 system prompt even instructs it to avoid using various ChatGPT-isms, presumably because they've already scraped so much GPT-generated material that it noticably skews their models output.

sovietmudkipz · 23h ago

I recall that AI trained on AI output over many cycles eventually becomes something akin to noise texture as the output degrades rapidly.

Won’t most AI produced content put out into the public be human curated, thus heavily mitigating this degradation effect? If we’re going to see a full length AI generated movie it seems like humans will be heavily involved, hand holding the output and throwing out the AI’s nonsense.

AstroBen · 20h ago

Some will be heavily curated, by those who care about quality. This is a lot slower to produce, requires some expertise to do right, so there will be far less of it

The vast majority of content will be (is) the fastest and easiest to create - AI slop

wkat4242 · 1d ago

Yes indeed some studies were already done on this.

zackangelo · 22h ago

There might be a plateau coming but I’m not sure that will be the reason.

It seems counterintuitive but there is some research suggesting that using synthetic data might actually be productive.

jsheard · 22h ago

I think there's probably a distinction to be made between deliberate, careful use of synthetic data, as opposed to blindly scraping 1PB of LLM generated SEO spam and force-feeding it into a new model. Maybe the former is useful, but the latter... probably not.

ksymph · 21h ago

Interesting. The tonal change has definitely been noticeable. It also seems a bit more succinct and precise with its word choice, less flowery. That does seem to be in line with Gemini's behavior.

vb-8448 · 23h ago

I wonder if at this point it really matters who used whose data ...

A proposal to restrict sites from accessing a users’ local network (github.com)

Phptop: Simple PHP ressource profiler, safe and useful for production sites (github.com)

Why I wrote the BEAM book (happihacking.com)

Show HN: I made a 3D SVG Renderer that projects textures without rasterization (seve.blog)

AtoB (YC S20) – Stripe for Transportation – is hiring engineers (jobs.ashbyhq.com)

Track Errors First (bugsink.com)

OpenAI slams court order to save all ChatGPT logs, including deleted chats (arstechnica.com)

Autonomous drone defeats human champions in racing first (tudelft.nl)

End of an Era: Landsat 7 Decommissioned After 25 Years of Earth Observation (usgs.gov)

A Spiral Structure in the Inner Oort Cloud (iopscience.iop.org)

LLMs and Elixir: Windfall or Deathblow? (zachdaniel.dev)

parrot.live (github.com)

From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning (arxiv.org)

Cursor 1.0 (cursor.com)

The iPhone 15 Pro’s Depth Maps (tech.marksblogg.com)

Air Lab – A portable and open air quality measuring device (networkedartifacts.com)

Tesla seeks to guard crash data from public disclosure (reuters.com)

Prompt engineering playbook for programmers (addyo.substack.com)

FFmpeg merges WebRTC support (git.ffmpeg.org)

Authentication with Axum (mattrighetti.com)

Comparing Claude System Prompts Reveal Anthropic's Priorities (dbreunig.com)

When memory was measured in kilobytes: The art of efficient vision (softwareheritage.org)

IRS Direct File on GitHub (chrisgiven.com)

A new Pitt study has upended decades-old assumptions about brain plasticity (pittwire.pitt.edu)

A practical guide to building agents [pdf] (cdn.openai.com)

Not all tokens are meant to be forgotten (arxiv.org)

Differences in link hallucination and source comprehension across different LLM (mikecaulfield.substack.com)

How we reduced the impact of zombie clients (letsencrypt.org)

Amelia Earhart's Reckless Final Flights (newyorker.com)

AGI is not multimodal (thegradient.pub)

Show HN: GPT image editing, but for 3D models (adamcad.com)

Foam: A free Roam alternative for VSCode (github.com)

Show HN: I built an old photo restoration tool using the Flux Kontext (restoreoldphotos.io)

Doubling Down on Open Source (langfuse.com)

Machine Code Isn't Scary (jimmyhmiller.com)

The Right to Repair Is Law in Washington State (eff.org)

Cockatoos have learned to operate drinking fountains in Australia (science.org)

Dr. Sbaitso (classicreload.com)

Sodern launches Astradia, a star tracker for GNSS-denied navigation (sodern.com)

VectorSmuggle: Covertly Exfiltrate Data in Embeddings (github.com)

Show HN: App.build, an open-source AI agent that builds full-stack apps (app.build)

Redesigned Swift.org is now live (swift.org)

Ask HN: Startup getting spammed with PayPal disputes, what should we do?

NoteGen is a cross-platform Markdown note-taking application (github.com)

Cloud Run GPUs, now GA, makes running AI workloads easier for everyone (cloud.google.com)

Old payphones get new life, thanks to Vermont engineer (core77.com)

Peer review is just internet trolls for science (mnky9800n.substack.com)

Flight Simulator Gave Birth to 3D Video-Game Graphics (spectrum.ieee.org)

Arthur C. Clarke predicted a computer-dominated future in the ’70s (2024) (openculture.com)

Canadian Government Buries “Lawful Access” Provisions in New Border Bill (michaelgeist.ca)

DeepSeek may have used Google's Gemini to train its latest model

Comments (14)