Machine Bullshit: Characterizing the Emergent Disregard for Truth in LLMs

4 delichon 1 7/20/2025, 8:48:20 PM arxiv.org ↗

Comments (1)

delichon · 14h ago

  Turns out, aligning LLMs to be "helpful" via human feedback actually teaches them to bullshit—and Chain-of-Thought reasoning just makes it worse!
https://x.com/kaiqu_liang/status/1943350770788937980