I threw Claude Code at an existing codebase a few months back and quickly quit—
untangling its output was slower than writing from scratch. The fix turned out
to be process, not model horsepower.
Iteration timeline
==================
• 50 % task success - added README.md + CLAUDE.md so the model knew the project.
• 75 % - wrote one markdown file per task; Codex plans, Claude codes.
• 95 %+ - built Backlog.md, a CLI that turns a high-level spec into those task files automatically (yes, using Claude/Codex to build the tool).
Three step loop that works for me
1. Generate tasks - Codex / Claude Opus → self-review.
2. Generate plan - same agent, “plan” mode → tweak if needed.
Iteration timeline
==================
• 50 % task success - added README.md + CLAUDE.md so the model knew the project.
• 75 % - wrote one markdown file per task; Codex plans, Claude codes.
• 95 %+ - built Backlog.md, a CLI that turns a high-level spec into those task files automatically (yes, using Claude/Codex to build the tool).
Three step loop that works for me 1. Generate tasks - Codex / Claude Opus → self-review.
2. Generate plan - same agent, “plan” mode → tweak if needed.
3. Implement - Claude Sonnet / Codex → review & merge.
For simple features I can even run this from my phone: ChatGPT app (Codex) → GitHub app → ChatGPT app → GitHub merge.
Repo: https://github.com/MrLesk/Backlog.md
Would love feedback and happy to answer questions!