15 AI Coding Agents evaluated with the same prompt

1 combray 1 6/27/2025, 6:29:25 PM github.com ↗

Comments (1)

combray · 4h ago
I tested 15 different agents and come up with my own way to range them. Would you hire an agent? Does it spark joy? What is the output quality? It turned into a 61-page deep dive into all the nitty-gritty. From IDE beasts like Cursor and Copilot to CLI warriors like Aider and full-stack champs like Replit and v0, etc – it’s a no-BS breakdown of what these tools can actually do when you throw a real-world web app prompt at ‘em.

All the resulting code is on

So, Who’s Crushing It?

Cursor Background Agent, v0, Warp: These three scored a near-perfect 24/25. Production-ready, polished, and just chef’s kiss. Cursor Agent was like, “Huh, didn’t expect that level of awesome.”

Copilot Agent & Jules: Tight GitHub integration makes ‘em PM-friendly, though they’re still a bit rough around the edges.

Replit: Stupid-easy for casuals. You’re trapped in their ecosystem, but damn, it’s a nice trap.

v0: UI prototyping on steroids. NextJS and Vercel vibes, but don’t expect it to play nice with your existing codebase.

RooCode & Goose: For you tinkerers who wanna swap models like Pokémon cards and run ‘em locally.

Who Flopped?

Windsurf. I wanted to hate it (gut feeling, don’t ask), and it delivered – basic tests, flimsy docs, and a Dockerfile that choked. 13/25, yawn.

Pro Tips:

Software Pros: Cursor + Warp is your power combo. IDE + CLI = dopamine hits for days. Casual Coders: Replit’s your jam. Zero friction, instant hosting. Designers: v0 for quick, slick MVPs. Just embrace the NextJS cult. Tinkerers: RooCode or Goose. Total control, local LLMs, open-source swagger.

The full report’s got the juicy details – screenshots, rants, and all. I will be doing another report on agents at the end of the summer – let me know what’s your go-to coding agent in 2025. Drop your hot takes or grill me on specifics below.