I’ve used GPT-4-turbo for months with production-grade scripting tasks where prompts like “fix only,” “don’t reformat,” or “return the full script” were obeyed consistently.
After GPT-4o launched, the same prompts started failing — the model rewrites logic, strips formatting, omits output, or reorders blocks even when explicitly told not to.
After GPT-4o launched, the same prompts started failing — the model rewrites logic, strips formatting, omits output, or reorders blocks even when explicitly told not to.
This GitHub repo documents specific examples with prompt/output comparisons: https://github.com/chapman4444/gpt4o-regression-report
OpenAI support asked me to submit real cases. If you’ve run into this, I’d love to compare notes or add more examples.