Frequent users of ChatGPT are robust detectors of AI text

Comments (2)

wrp · 1d ago

TFA should be compared with Tufts (2025) A Practical Examination of AI-Generated Text Detectors for Large Language Models.[0] Tufts found that automated detection is very unreliable, while Russell found the opposite for human evaluators.

The explanation for the difference is that automated discrimination has relied mainly on structural factors such as average sentence/paragraph length and frequency of stock words/phrases and certain parts of speech. Human evaluators look at content factors such as repetition of ideas, less precise wording, generalizations rather than concrete examples, overall conceptual coherence, and factual errors.

https://arxiv.org/abs/2412.05139

wrp · 19h ago

For most of the issues that human evaluators catch, I can conceive of technical solutions, except for the problem of factual errors. Solving the problem of factuality requires a sufficient model of the world, which is possible only in very restricted domains. I'm afraid the end result of LLM development will be an extremely convincing purveyor of misinformation.

There is some discussion of LLMs and models in another thread. (https://news.ycombinator.com/item?id=44625629)

LLM Inevitabilism (tomrenner.com)

Kiro: A new agentic IDE (kiro.dev)

Linux Reaches 5% Desktop Market Share in USA (ostechnix.com)

Valve confirms credit card companies pressured it to delist certain adult games (pcgamer.com)

Hyatt Hotels are using algorithmic Rest “smoking detectors” (twitter.com)

Reflections on OpenAI (calv.info)

ChatGPT agent: bridging research and action (openai.com)

Ukrainian hackers destroyed the IT infrastructure of Russian drone manufacturer (prm.ua)

Mistral Releases Deep Research, Voice, Projects in Le Chat (mistral.ai)

Show HN: Shoggoth Mini – A soft tentacle robot powered by GPT-4o and RL (matthieulc.com)

My Self-Hosting Setup (codecaptured.com)

Oakland cops gave ICE license plate data; SFPD also illegally shared with feds (sfstandard.com)

Cloudflare 1.1.1.1 Incident on July 14, 2025 (blog.cloudflare.com)

XMLUI (blog.jonudell.net)

Coding with LLMs in the summer of 2025 – an update (antirez.com)

Death by AI (davebarry.substack.com)

Apple's MLX adding CUDA support (github.com)

Data brokers are selling flight information to CBP and ICE (eff.org)

Ask HN: Is it time to fork HN into AI/LLM and "Everything else/other?"

Ex-Waymo engineers launch Bedrock Robotics to automate construction (techcrunch.com)

Nobody knows how to build with AI yet (worksonmymachine.substack.com)

Cognition (Devin AI) to Acquire Windsurf (cognition.ai)

A 14kb page can load much faster than a 15kb page (2022) (endtimes.dev)

New colors without shooting lasers into your eyes (dynomight.net)

OpenAI claims gold-medal performance at IMO 2025 (twitter.com)

I'm switching to Python and actually liking it (cesarsotovalero.net)

To be a better programmer, write little proofs in your head (the-nerve-blog.ghost.io)

I want an iPhone Mini-sized Android phone (2022) (smallandroidphone.com)

Self-taught engineers often outperform (2024) (michaelbastos.com)

Fully homomorphic encryption and the dawn of a private internet (bozmen.io)

Hand: open-source Robot Hand (github.com)

Fstrings.wtf (fstrings.wtf)

Tell HN: Notion Desktop is monitoring your audio and network

Anthropic tightens usage limits for Claude Code without telling users (techcrunch.com)

Altermagnets: The first new type of magnet in nearly a century (newscientist.com)

The current hype around autonomous agents, and what actually works in production (utkarshkanwat.com)

The bewildering phenomenon of declining quality (english.elpais.com)

LLM architecture comparison (magazine.sebastianraschka.com)

My experience with Claude Code after two weeks of adventures (sankalp.bearblog.dev)

Shipping WebGPU on Windows in Firefox 141 (mozillagfx.wordpress.com)

Two guys hated using Comcast, so they built their own fiber ISP (arstechnica.com)

Ring introducing new feature to allow police to live-stream access to cameras (eff.org)

AI capex is so big that it's affecting economic statistics (paulkedrosky.com)

AI slows down open source developers. Peter Naur can teach us why (johnwhiles.com)

LIGO detects most massive black hole merger to date (caltech.edu)

Staying cool without refrigerants: Next-generation Peltier cooling (news.samsung.com)

Random selection is necessary to create stable meritocratic institutions (assemblingamerica.substack.com)

FFmpeg devs boast of another 100x leap thanks to handwritten assembly code (tomshardware.com)

NIST ion clock sets new record for most accurate clock (nist.gov)

Make Your Own Backup System – Part 1: Strategy Before Scripts (it-notes.dragas.net)

Frequent users of ChatGPT are robust detectors of AI text

Comments (2)