Building a Redis Clone – Turning a Single Node into a Distributed Cluster (beyondthesyntax.substack.com)

Qwen3 is substantially better in my local testing. As in, adheres to the prompt better (pretty much exactly for the 32B parameter variant, very impressive) and is more organic sounding.

In simplebench gpt-oss (120 bn) flopped hard so it doesn't appear particularly good at logical puzzles either.

So presumably, this comes down to...

- training technique or data

- dimension

- lower number of large experts vs higher number of small experts

jszymborski · 36m ago

If I had to make a guess, I'd say this has much, much less to do with the architecture and far more to do with the data and training pipeline. Many have speculated that gpt-oss has adopted a Phi-like synthetic-only dataset and focused mostly on gaming metrics, and I've found the evidence so far to be sufficiently compelling.

7moritz7 · 32m ago

That would be interesting. I've been a bit sceptical of the entire strategy from the beginning. If oss was actually as good as o3 mini and in some cases o4 mini outside benchmarks, that would undermine openai's api offer for gpt 5 nano and maybe mini too.

Edit: found this analysis, it's on the HN frontpage right now

> this thing is clearly trained via RL to think and solve tasks for specific reasoning benchmarks. nothing else.

https://x.com/jxmnop/status/1953899426075816164

CuriouslyC · 20m ago

The strategy of Phi isn't bad, it's just not general. It's really a model that's meant to be fine tuned, but unfortunately fine tuning tends to shit on RL'd behavior, so it ended up not being that useful. If someone made a Phi style model with an architecture that was designed to take knowledge adapters/experts (i.e. small MoE model designed to get separately trained networks plugged into them with routing updates via special LoRA) it'd actually be super useful.

homarp · 39m ago

"From GPT-2 to gpt-oss: Analyzing the Architectural Advances And How They Stack Up Against Qwen3"

Gen AI is coming for online checkout in seismic shift for internet shopping (cnbc.com)

Writing a brand-new OS is almost impossible by now (blog.wellosoft.net)

The Identity Crisis: Why LLMs Don't Know Who They Are (eval.16x.engineer)

AI Prompt Crafting: A Race to the Global Bottom (toot.io)

Basking in the Grace of Others (startingfromnix.com)

The role of physical and cognitive effort on time perception (nature.com)

Show HN: I collected 70k online communities – semantic search to find your niche (pluggo.ai)

Analyzing Fear (gist.githubusercontent.com)

Can AI 'defeat' authentication? Depends on who you ask (thenewstack.io)

Sunlight-activated material turns PFAS in water into harmless fluoride (phys.org)

Ask HN: Why is Usenet not coming back?

Philz Coffee sold to private equity firm Freeman Spogli for $145M (missionlocal.org)

Building a Redis Clone – Turning a Single Node into a Distributed Cluster (beyondthesyntax.substack.com)

Zero-to-Hero Deep Reinforcement Learning Course: Update with Advanced Topics (drlzh.ai)

Parallelizing Linux Writeback (blog.linuxnews.dev)

GPT-5: It Just Does Stuff (oneusefulthing.org)

Review: Wildtype's Lab-Grown Salmon (romanhauksson.substack.com)

Jason (Advisory Group) (en.wikipedia.org)

Show HN: Llmswap – Python package to reduce LLM API costs by 50-90% with caching (pypi.org)

Red.anthropic.com (red.anthropic.com)

AI's "Just Ship it." problem (leahtharin.com)

The Anti-Pattern Game (hakon.gylterud.net)

Diffusion Language Models Are Super Data Learners (jinjieni.notion.site)

AOL closes its dial up internet service (ispreview.co.uk)

Quick and Dirty Website Change Monitoring (x86.lol)

Employees spotting problems help the business, but leaders empower flatterers (phys.org)

Show HN: We just released ArkHR, an AI first HR tool (myarkhr.com)

QNX: The Incredible 1.44M Demo (archive.org)

Show HN: Implemented my own file system syncing algorithm (twitter.com)

Ikigai Finder of Purpose (ikigaifinder.replit.app)

Ask HN: Best way to get a land line for my kids?

Open hardware desktop 3D printing is dead (josefprusa.com)

Thorsten Ball on Technical Blogging (writethatblog.substack.com)

Prompts Are Missing Context (louk.io)

Show HN: A new alternative to Softmax attention – live GD-Attention demos (zenodo.org)

Linus rejects RISC-V patches for Linux 6.17: "garbage", "came in too late" (phoronix.com)

Stream processing doesn't need to suck (blog.epsiolabs.com)

Generate AWS Architecture Diagrams with Amazon Q (awsfundamentals.com)

Mindless Machines, Mindless Myths (lareviewofbooks.org)

Show HN: Day 2 of QuotationGenie – Feedback Needed on My Quotes&Invoicing SaaS (quotegenie.com)

Zig's Lovely Syntax (matklad.github.io)

Just as Russia's Most-Famous Dissident Seemed Set to Go Free, Tragedy Struck (wsj.com)

How trade secrets fuel the international auto industry (insight.kellogg.northwestern.edu)

Flintlock – Create and manage the lifecycle of MicroVMs, backed by containerd (github.com)

Pure quantum state without the need for cooling (ethz.ch)

What Happens When Politicians Meddle with Economic Data: Argentina's Example (wsj.com)

As electric bills rise, evidence mounts that data centers share blame (apnews.com)

Ask HN: How does GPT-OSS compare to other open-source models?

My side project: a newsletter that sends u available domains before they're gone (namedrop.us)

Respiratory viral infections awaken metastatic breast cancer cells in lungs (nature.com)

GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2

Comments (5)