GPT-4o breaks code structure under strict prompts (compared to GPT-4-turbo)

I’ve used GPT-4-turbo for months with production-grade scripting tasks where prompts like “fix only,” “don’t reformat,” or “return the full script” were obeyed consistently.

After GPT-4o launched, the same prompts started failing — the model rewrites logic, strips formatting, omits output, or reorders blocks even when explicitly told not to.

This GitHub repo documents specific examples with prompt/output comparisons: https://github.com/chapman4444/gpt4o-regression-report

OpenAI support asked me to submit real cases. If you’ve run into this, I’d love to compare notes or add more examples.

GPT-4o breaks code structure under strict prompts (compared to GPT-4-turbo)

Comments (1)