Reward hacking is becoming more sophisticated and deliberate in frontier LLMs

2 cubefox 0 4/26/2025, 8:58:54 PM lesswrong.com ↗

Comments (0)

No comments yet