Stop squashing your commits. You're squashing your AI too
2 points by jannesblobel 9h ago 8 comments
How can a mutex in Wine be faster than a native one on Linux
3 points by lh_mouse 11h ago 1 comments
Ask HN: Best codebases to study to learn software design?
100 points by pixelworm 2d ago 89 comments
I reverse-engineered a bug in my PPO agent that gave it a 9x performance boost
2 wmaxlees 1 8/26/2025, 3:32:46 PM theprincipledagent.com ↗
In my last post, I found a critical bug in my PPO agent. Fixing it was the "right" thing to do, but it tanked my agent's performance from a score of 84 all the way down to 9.
This post is the forensic investigation into why that bug was so helpful. I started with a simple hypothesis that it was just adding random noise for exploration, which turned out to be partially correct but didn't explain the whole story.
The real "secret sauce" was that the bug was adding correlated noise, creating a consistent optimistic or pessimistic bias for an entire trajectory. I managed to reverse-engineer this effect into a new, principled technique that successfully reproduced the 84 score.
The post is the full deep dive, from visualizing the original bug's signal to designing a new form of state-dependent exploration from scratch. Happy to answer any questions about the process or the JAX/Flax implementation.