Ask HN: Why is Gmail so incompetent at basic search?

22 points by sn9 2h ago 16 comments

Instant responsiveness in user interfaces is annoying

19 points by zero-sharp 10h ago 22 comments

I Use EDA and Local LLMs to Make Better Product Decisions

4 points by pdxbug 8h ago 0 comments

Ask HN: How to find non-popular blogs and forums?

14 points by dominicq 11h ago 14 comments

I'm Peter Roberts, immigration attorney who does work for YC and startups. AMA

162 points by proberts 3d ago 264 comments

Tell HN: Windows notepad can now edit Markdown files

4 points by spapas82 13h ago 3 comments

Where can I sell a dataset I've created?

5 points by tflinton 18h ago 4 comments

Ask HN: What is a great project based Rails tutorial for 2025?

11 points by coopreme 1d ago 6 comments

Ask HN: Is there a way to get uBlock Origin on Chrome >= 138?

2 points by nomilk 14h ago 10 comments

Ask HN: Any active COBOL devs here? What are you working on?

240 points by _false 3d ago 183 comments

Ask HN: What is your most disturbing moment with generative AI?

8 points by gardnr 12h ago 8 comments

Ask HN: US expats/nomads, how do you find remote-out-of-US jobs in US?

14 points by rudnevr 1d ago 12 comments

Ask HN: What Pocket alternatives did you move to?

121 points by ahmedfromtunis 4d ago 138 comments

Ask HN: What's Your Useful Local LLM Stack?

85 points by Olshansky 6d ago 51 comments

Ask HN: How do I prevent AI from reading/training off my content?

6 points by blindriver 1d ago 5 comments

Navigating AI Dementia: Strategies for Safe Rollback

3 points by upwardbound2 1d ago 3 comments

Why do we still flatten embedding spaces?

7 points by Intrinisical-AI 1d ago 10 comments

Ask HN: Looking for alpha testers for HRAM (asm)

6 points by 90s_dev 1d ago 1 comments

Ask HN: What could I build to make your life a little easier?

14 points by uint9_t 2d ago 16 comments

Ask HN: Is it time to fork HN into AI/LLM and "Everything else/other?"

532 points by bookofjoe 6d ago 370 comments

Metis Agent Starter Kit – Build production AI agents in minutes, not weeks

4 points by cjohnsonpr 1d ago 0 comments

Ask HN: Recommendation for good app to read ArXiv on iOS?

6 points by doruk101 1d ago 2 comments

Ask HN: What's the competitive advantage these days?

14 points by creepy 2d ago 12 comments

Which SaaS have you been able to replace with AI?

9 points by fifthace 2d ago 14 comments

Ask HN: Does anyone have OpenBSD projects looking for unpaid/paid help?

8 points by nhgiang 3d ago 1 comments

Ask HN: Will AI models over time converge into the same system?

7 points by ThinkBeat 2d ago 9 comments

How do you retain what you read from nonfiction books?

9 points by lachiejones 2d ago 11 comments

Ask HN: Having terrible time with paid versions of ChatGPT and Claude

6 points by gist 1d ago 11 comments

Tell HN: Notion Desktop is monitoring your audio and network

427 points by HoyaSaxa 4d ago 169 comments

Ask HN: Changing Developer Career Specialty

9 points by Rick76 4d ago 4 comments

Ask HN: OpenAI zero'd balance (actual money, not free credits) after inactivity

7 points by footempbar 3d ago 5 comments

Ask HN: How do you stay on top of AI tech?

18 points by kleiba 6d ago 19 comments

Can Software Be Durable?

39 points by maraoz 1d ago 44 comments

Qwen3-235B-A22B-Instruct-2507

32 tosh 2 7/21/2025, 5:19:27 PM huggingface.co ↗

Comments (2)

jackmhny · 2h ago

these benches are crazy

    +-------------+----------+-----------+-------+-------+-------+
    | Task        | A22B-Ins |      A22B |    K2 | Opus4 | Deeps |
    +-------------+----------+-----------+-------+-------+-------+
    | GPQA        |    *77.5 |      62.9 | +75.1 | -74.9 |  68.4 |
    | AIME25      |    *70.3 |      24.7 | +49.5 |  33.9 | -46.6 |
    | LiveCB_v6   |    *51.8 |      32.9 | +48.9 |  44.6 | -45.2 |
    | ArenaHard2  |    *79.2 |     -52.0 | +66.1 |  51.5 |  45.6 |
    | BFCL_v3     |    *70.9 |     +68.0 | -65.2 |  60.1 |  64.7 |
    +-------------+----------+-----------+-------+-------+-------+

* 1st + 2nd - 3rd

homarp · 4h ago

teased on twitter, https://x.com/JustinLin610/status/1947281769134170147

and later they will release the thinking model

on selected benchmarks, it beats kimi