I reverse-engineered a bug in my PPO agent that gave it a 9x performance boost

Comments (1)

wmaxlees · 4h ago

Hi HN, author here.

In my last post, I found a critical bug in my PPO agent. Fixing it was the "right" thing to do, but it tanked my agent's performance from a score of 84 all the way down to 9.

This post is the forensic investigation into why that bug was so helpful. I started with a simple hypothesis that it was just adding random noise for exploration, which turned out to be partially correct but didn't explain the whole story.

The real "secret sauce" was that the bug was adding correlated noise, creating a consistent optimistic or pessimistic bias for an entire trajectory. I managed to reverse-engineer this effect into a new, principled technique that successfully reproduced the 84 score.

The post is the full deep dive, from visualizing the original bug's signal to designing a new form of state-dependent exploration from scratch. Happy to answer any questions about the process or the JAX/Flax implementation.

Ask HN: Why hasn't x86 caught up with Apple M series?

Ask HN: Is there a temp phone number like temp email?

Patient Lisp Hacker Seeks Same for Long Walks Through IPL-V Code

Ask HN: What to Do with Old iPads?

High rate of LLM (GPT5) hallucinations in dense stats domains (cricket)

Ask HN: How can I recover and run my old mobile game from the 2010s?

429 Too Many Requests from registry.npmjs.org

Ask HN: Why do people hate on Sabine Hossenfelder so much?

Stop squashing your commits. You're squashing your AI too

How can a mutex in Wine be faster than a native one on Linux

Ask HN: What is wrong with modern software development

Ask HN: Recommandation for an Ergonomic Keyboard?

Ask HN: Someone has committed 20K+ LoC to a PR, exhausting my CI & AI workflows

Ask HN: Best codebases to study to learn software design?

Ask HN: Why does the US Visa application website do a port-scan of my network?

Ask HN: How can I trace what user queries make AI bots crawl my site?

Ask HN: Are AI filters becoming stricter than society itself?

Ask HN: I just abandoned my PyCharm subscription, what should I use now?

ASK HN: AI in high school. Will teachers and schools have to compensate?

Ask HN: What is your source for answers?

Ask HN: Devices to allow children to listen to podcasts on my local network?

DSPy GEPA Example: Listwise Reranker

Ask HN: Best Marketplaces for Used Servers?

Ask HN: No easy way for tvOS to display long documents (e.g., terms of service)?

Ask HN: What is the biggest problem LLMs solved in your life/work

Ask HN: How do you find early stage startups to join

Problem with Payment Gateways

Ask HN: Does using public transportation make you more creative than driving?

HeartWatch: A Proactive Child Safety System

Gemini in Gmail Is Pretty Well Useless

Ask HN: Why is Apple so far behind with Siri?

Ask HN: What's Hacker News's vision for the future?

Can you recommend movies like The Social Network?

Ask HN: Is it possible to do great things in STEM in a not so great country?

Ask HN: Non-Smart TV Recommendations?

AI App Dev Log: The Story of Our App Begins

Where is the exponential growth part of AI?

I reverse-engineered a bug in my PPO agent that gave it a 9x performance boost

Comments (1)