Ask HN: OpenAI models vs. Gemini 2.5 Pro for coding and swe
4 endorphine 5 4/23/2025, 3:35:15 PM
In your experience, which of the two models (all of OpenAI vs Gemini 2.5 Pro) are better for having as assistants to ask SWE/software systems related questions and doing long and complex reasoning?
I'm debating whether there's any point in paying for ChatGPT vs. paying (or even using the free version) of Gemini 2.5 Pro.
I have the feeling that most HNers prefer the latter, however in livebench I think OpenAI surpasses Gemini for coding.
Regarding context windows, Gemini currently offers 1M tokens (reportedly increasing to 2M soon), GPT-4.1 also handles a large window of 1m tokens, and Claude provides 200k. In my experience testing them with large code files (around 3-4k lines), I found Gemini 2.5 Pro and Claude 3.7 Sonnet performed quite similarly, both handling the large context well and providing good solutions.
However, my impression was that GPT-4.1 didn't perform quite as well, While GPT-4.1 is certainly capable, I feel Gemini has a slight edge in this area right now. Based on this, I'd lean towards using Gemini 2.5 Pro for extremely large contexts needing high-quality results, GPT-4.1 for backend logic, and found Claude 3.7 particularly effective for UI interface tasks.
Purely from a code quality perspective, they're all about the same, and they all generate code that rarely works for the first time. At least from my experience, and highly depending on language. For instance, Q-cli with Rust seems to generate better output for me than Gemini with Rust. And ChatGPT with JS gives me way better code than Claude with JS.
I honestly think that currently in the market, it's not really a choice of which is better, but which is the right tool for workflow and language.