IMO 2025 LLM results are in

Comments (1)

mynti · 5h ago

> Grok-4 significantly underperformed compared to expectations. Many of its initial responses were extremely short, often consisting only of a final answer without explanation.

this is very weird. given how verbose most models usually are, there must have been something wrong in the system prompt.

also: grok used 89996 input tokens compared to 591624 for o3 high. What kind of tokenizer are they using that compresses the input so much? I suppose all inputs are actually the same, since the math problem + instructions are the same. only difference is the tokenizer or the system prompt. but i suppose it would not make up the difference. is o3 using 500k more tokens for their system prompt?

Ask HN: Any active COBOL devs here? What are you working on?

Ask HN: What Pocket alternatives did you move to?

Ask HN: Does anyone have OpenBSD projects looking for unpaid/paid help?

Ask HN: Cursor is using 269,738 tokens to edit 1200 token file

Tell HN: Notion Desktop is monitoring your audio and network

Ask HN: How do you find free academic/scientific material?

Ask HN: Changing Developer Career Specialty

Ask HN: What's Your Useful Local LLM Stack?

Ask HN: How did Soham Parekh get so many jobs?

Ask HN: Is it time to fork HN into AI/LLM and "Everything else/other?"

Ask HN: What is the state of support for mutable torrents?

Ask HN: What should we do about state ID legislation?

Ask HN: Is OpenAI Acquiring Cursor?

Ask HN: How are you tracking dev productivity without feeling micromanaging?

How big is carpooling market?

Ask HN: Developer-as-a-Service?

Ask HN: How do you stay on top of AI tech?

AIHint an open standard for signed verifiable metadata readable by AI on the web

Google raising Nest Aware Plus pricing by 25%

Tell HN: Humanloop acquired, sunsetting Sept 8th

Ask HN: Best Practices for Writing Tests?

Ask HN: What is the best way to learn 3D modeling for 3D printing?

Ask HN: Is Linux for laptop worth the trouble?

Ask HN: How do you avoid Kanban boards becoming "to-do list graveyards"?

Circlecropimage.net –image tools: crop, enhance, compress, remove background

Ask HN: How much of OpenAI code is written by AI?

Ask HN: Is anyone using Super Grok Heavy for code?

Ask HN: Where can I give my app away for free?

IMO 2025 LLM results are in

Comments (1)