Veles, Google's open source secret scanner (opensource.googleblog.com)

There are thousands of models at the moment available at Hugging Face. But whenever I need a model for specific task, I am struggling to find SOTA model, can you recommend me how to find it?

I am not ML practitioner, I just need models for my work, for example for coding, I know we can use Claude/Gemini models, but sometimes I want to compare them to SOTA open source, every week something better is coming and reading articles from month ago or finding LLM leaderboard for a specific task is difficult sometimes. I think some kind of model picker already exists, but don't know where

Comments (4)

Oras · 7h ago

I usually go to OpenRouter usage to learn that by category https://openrouter.ai/rankings

Scroll down to categories, and select from the dropdown on top right of the chart.

throwaw12 · 6h ago

that's nice addition to my tool set :) thanks!

but it seems mostly reflects proprietary models (because they are easier to serve)

incomingpain · 7h ago

For open source, you're not going to see stats well online. Openhands + devstral doesnt touch the internet, so wont make it to many stats.

You can look at benchmarks.

https://livebench.ai/#/?Agentic+Coding=a

Keep scrolling until you see something your size. Deepseek R1 is nice, but 600B isnt running on my hardware. You'll also notice they arent doing everything. dominated by the Saas options.

https://huggingface.co/models

This is sorted by trending by default. This tends to help show interest but not necessarily the best.

throwaw12 · 6h ago

> Deepseek R1 is nice, but 600B isnt running on my hardware.

Yeah, this is my concern as well, usually top SOTA generic models are good at many tasks, but I can't test them quickly on my machine locally. Especially when seeing claims how 32B model is outperforming proprietary models in benchmarks, I really want to test it myself in my tasks, but after some time they are dropped from news/trends and difficult to find them

Veles, Google's open source secret scanner (opensource.googleblog.com)

Show HN: Let ChatGPT Plus control any Python or JavaScript object in 3 lines (chatgpt.com)

Leak: Anthropic Says the Company Will Pursue Gulf State Investments After All (wired.com)

Thursday Is Durable Computing Day

A Quick(ish) Introduction to Tuning Postgres (byteofdev.com)

Ask HN: Can we better use heat from data centers?

Distribution Package vs. Import Package (packaging.python.org)

Burning Man Festival Is Burning Through Cash (bloomberg.com)

MCK: Open-Source MongoDB Operator (github.com)

ΜFork: A pure actor-based concurrent machine architecture with memory-safety an (ufork.org)

Study: How American Consumers Are Using AI (joeyoungblood.com)

Why "How many tennis balls fit in a bus?" is a good interview question (medium.com)

Amazon buys Bee AI wearable that listens to everything you say (theverge.com)

Inheritance over Composition, Sometimes (death.andgravity.com)

Show HN: Featurevisor v2.0 – declarative feature flags management with Git (featurevisor.com)

Crowdfunding Success – Was it worth it? (atomic14.substack.com)

Show HN: It's Like FIFA for Developers 1vs1 Code Battle (battlegpt.website)

Why everyone is probably wrong about AI (greyenlightenment.com)

Brave Browser Blocks Windows Recall (neowin.net)

Taiwan is creating an offshore wind industry to fuel its semiconductor factories (restofworld.org)

Ask HN: How have you optimized your company/ work?

Show HN: Like Lusha/Apollo, but with 250M deliverable emails (hivepoint.io)

Lost in the Wilderness: Ansel Adams in the 1960s (ucrarts.ucr.edu)

Use AI to Create Professionally Bound Coloring Books (coloring.app)

Integrate Email Notifications with RustMailer: A Must-Have for Developers (indiehackers.com)

Conspiracy theorists don't realize they're on the fringe (arstechnica.com)

Late Ozzy Osbourne's Short, Sweet Stint in Video Games (kotaku.com)

PMs Were Vibe Coders All Along (justinpaulson.com)

Top questions every recruiter is asking in 2025: Answered with AI sourcing (sourcegeek.com)

NonRAID – fork of unRAID array kernel module (github.com)

Zetamax – Zetamac clone with progress tracking and modern UI (zetamax.xyz)

First Users

Building a fuzzing testing framework with Locust and Docker (lucas-montes.com)

Interview with Senior DevOps engineer 2025 [video] (youtube.com)

Show HN: Zero-back-end process mining tool running Python in WASM (enthoosa.com)

Fixed point thm in metric spaces and its application to the Collatz conjecture (arxiv.org)

Unsafe and Unpredictable: My Volvo EX90 Experience (myvolvoex90.com)

Building Fast UPDATEs for ClickHouse (clickhouse.com)

"Zero Trust Is Dead": Tailscale's Survey on Secure Networks (tailscale.com)

Raku: First Programming Language? (wayland.github.io)

Disconnecting phone from internet creates mood boost on par with antidepressants (npr.org)

Space-Based Missile Interceptors for Golden Dome Being Tested by Northrop (twz.com)

Scientists Are Planning for Life After Finding Aliens (universetoday.com)

I built an AI plant identifier app

iOS 26 beta 4 adds more 'liquid' back to Liquid Glass design (9to5mac.com)

The Twelve-Factor App (12factor.net)

Show HN: Port – an open source, identifier-less, E2EE messaging app (github.com)

RIP Ozzy (the-sun.com)

Quarterly Publication of Individuals Who Have Chosen to Expatriate [pdf] (public-inspection.federalregister.gov)

Musk Allies to Raise Up to $12B for XAI Chips (wsj.com)

Ask HN: How do you find SOTA LLMs for a task?

Comments (4)