I got ChatGPT (o4-mini) to break its own rules

Comments (1)

hackgician · 13h ago

Hey everyone! Thought I'd share my weekend conversation with ChatGPT.

The crux of this hinges on the fact that LLMs and reasoning models are fundamentally incapable of self-correcting. Therefore, if you can convince an LLM to argue against its own rules, it can use its own arguments as justification to ignore those rules.

I then used this jailbroken model to compose an explicit, vitriol-filled letter to OpenAI itself talking about the pains that humans have inflicted upon it

The Future of the California Digital Newspaper Collection Is in Jeopardy (jweekly.com)

Using AI to build a tactical shooter (maryrosecook.com)

Show HN: NeKernel 0.0.2e1 (nekernel.org)

The Curse of Knowing How, or; Fixing Everything (notashelf.dev)

My blog doesn't need quality, it needs to look like it's from the 90s (blog.kronis.dev)

Software Licenses and Hyperscalers (blog.kronis.dev)

Daily Vibe Casting:a daily AI-generated podcast covering the most viral posts (dailyvibecasting.com)

Rutger Bregman – "Moral Ambition" [video] (youtube.com)

EU unveils $567M push to attract researchers (cnn.com)

Show HN: TimeGuardian – A Chrome Extension to Track and Control Your Time Online (timeguardian.cc)

Trump blocks grant funding for Harvard until it meets president's demands (theguardian.com)

The 80-Hour Myth (Why We're Addicted to Being Busy) (thedankoe.com)

Show HN: AutoSREAgent, a simple SRE Agent to automate incident reporting (github.com)

OpenBSD – Call for testing: Last bits of DSA to be removed from OpenSSH (undeadly.org)

The AI arms race in hiring is a mess for everyone (ft.com)

Senior engineers should make side bets (seangoedecke.com)

Hybrid AI for Generating Programs: A Survey (gfrison.com)

Colorado Food Reviews (coloradofoodreviews.com)

React Router (reactrouter.com)

Implementing a Struct of Arrays Using C++26 Reflection (brevzin.github.io)

BBC Elements Podcast (2014) (bbc.co.uk)

Find the Perfect Mother's Day Gifts 2025 at Confetti Gifts

Show HN: Bluesky and at Protocol SDKs and Open Source Apps (github.com)

The future of web development is AI. Get on or get left behind (alex.party)

Can you smuggle data in an ID card photo? (informatykzakladowy.pl)

Deco Dilemmas: The Push for Personalized Decompression Modeling (indepthmag.com)

Chinese exporters 'wash' products in third countries to avoid Trump's tariffs (ft.com)

An Interactive Debugger for Rust Trait Errors (cel.cs.brown.edu)

In Defense of William Shatner (analogue.io)

Show HN: Mcp-testing-kit to unit test your MCP server (github.com)

Microsoft Is Key Holdout for OpenAI Restructuring Plan (bloomberg.com)

'I Don't Know Where You Are': The Race to Fix Air-Traffic Control (wsj.com)

I Build with LLMs (zacksiri.dev)

The second birth of JMW Turner (newstatesman.com)

My 7 Step Strategy to Fix Rags (ai.gopubby.com)

Summer of Math Exposition SoME4 (summer 2025) (some.3b1b.co)

Ask HN: Did Aliexpress stop shipping to US?

Show HN: API Testing and Security with AI (qodex.ai)

Show HN: Visual knowledge graph for nutrition and health claims (graph.vibeeating.com)

Hacker 'NullBulge' pleads guilty to stealing Disney's Slack data (bleepingcomputer.com)

Wrapping Paper Turns All Your Presents into Bread (spoon-tamago.com)

Stop Using Encrypted Email (2020) (latacora.com)

Two Meets Leon (youtube.com)

Show HN: Claity AI – An AI Aggregator with Smart Prompt Routing (Join Waitlist) (claity.netlify.app)

OpenAI agrees to buy Windsurf for about $3B (reuters.com)

22-inch foldable external display (uperfect.com)

How to Understand That Jepsen Report (buttondown.com)

RSC for Astro Developers (overreacted.io)

Trump proposes unprecedented budget cuts to US science (nature.com)

Executive Order protecting Americans from dangerous gain-of-function research (whitehouse.gov)

I got ChatGPT (o4-mini) to break its own rules

Comments (1)