The Windows Subsystem for Linux is now open source (blogs.windows.com)

Sure, they're being overfitted to the dataset. But with most performing similarly across even the hardest of 3rd party benchmarks, think frontier math back in Nov and now, we're closer than ever to a specialisation shift.

Hard to say at what % but once code reviews get better its likely 2025 is the last year SWE is a sought after job * demand and supply both

candiddevmike · 6h ago

SWE bench scores, like a lot of other metrics for LLMs, are pretty divorced from reality IMO. It's a lot like only learning to pass tests vs actual understanding.

Once GenAI companies stop hiring SWEs, I'll believe the doomers.

harshitaneja · 6h ago

I help hire for a few clients as well as for my own small organization. We are already seeing impact of these tools on our hiring. For the same responsibilities and tasks we are already requiring lesser resources. For clients with less complex problems we are able to manage similar work with 60% of the resources planned. And that's when most of our work is mathematical modelling, heuristics, constraint programming and such. However, I don't foresee at least for the next few years we would ever get to a scenario where we don't hire developers. Given that most hiring has shifted to only senior developers.

dingnuts · 3h ago

being able to do more things with fewer resources (which lowers costs) always increases demand enough to make up for the reduction of labor caused by the automation

Analogy: when the chainsaw was invented, we didn't stop having lumberjacks, they just learned to use chainsaws

MukundMohanK · 6h ago

Reality is here whether we like it or not - https://fred.stlouisfed.org/graph/?g=1DEP0

hackeman300 · 5h ago

Surely there are no other macroeconomic factors that could have played a role in this decline too

predkambrij · 6h ago

I would like to know why this post got flagged. Is it misleading, or dangerous software? If it's truly #1 open-source on SWE-bench that's quite impressive.

grammarxcore · 6h ago

> Many samples have an issue description that is underspecified, leading to ambiguity on what the problem is and how it should be solved.

OpenAI apparently tuned _basic discovery and refinement_ out of the tests so I don’t think this is a benchmark of anything useful. It can’t replace a human but can possibly make a human more productive.

https://openai.com/index/introducing-swe-bench-verified/

nateburke · 7h ago

Am I correct in understanding that SWE-bench is limited to python?

simonw · 7h ago

The core benchmark is only Python, but there is also SWE-bench Multimodal which uses JavaScript: https://arxiv.org/abs/2410.03859

And the new SWE-bench Multilingual (released a couple of weeks ago) which covers 9 programming languages - C, C++, Go, Java, JavaScript, TypeScript, PHP, Ruby and Rust: https://www.swebench.com/multilingual.html

babushkaboi · 7h ago

yeah, they're all python at the moment.

laxyz · 8h ago

The full pipeline used for SWE-bench Verified is open-source: https://github.com/smallcloudai/refact-bench

amarcheschi · 7h ago

I think the title doesn't make it clear that the results are obtained with closed models

brrrrrm · 7h ago

Open-source use of closed source models?

NicuCalcea · 7h ago

Looks like they support self-hosted models: https://docs.refact.ai/supported-models/#self-hosted-version

The Windows Subsystem for Linux is now open source (blogs.windows.com)

Spaced repetition systems have gotten better (domenic.me)

Watching AI drive Microsoft employees insane (old.reddit.com)

Have I Been Pwned 2.0 (troyhunt.com)

Veo 3 and Imagen 4, and a new tool for filmmaking called Flow (blog.google)

OpenAI to buy AI startup from Jony Ive (bloomberg.com)

Gemini Diffusion (simonwillison.net)

Zod 4 (zod.dev)

Claude 4 (anthropic.com)

Thoughts on thinking (dcurt.is)

DDoSecrets publishes 410 GB of heap dumps, hacked from TeleMessage (micahflee.com)

Don't guess my language (vitonsky.net)

Devstral (mistral.ai)

France Endorses UN Open Source Principles (social.numerique.gouv.fr)

Push Ifs Up and Fors Down (matklad.github.io)

Making video games (without an engine) in 2025 (noelberry.ca)

GitHub Copilot Coding Agent (github.blog)

By default, Signal doesn't recall (signal.org)

Jules: An asynchronous coding agent (jules.google)

Ground control to Major Trial (virtualize.sh)

The scientific “unit” we call the decibel (lcamtuf.substack.com)

A Research Preview of Codex (openai.com)

Getting AI to write good SQL (cloud.google.com)

Deep Learning Is Applied Topology (theahura.substack.com)

Proton threatens to quit Switzerland over new surveillance law (techradar.com)

InventWood is about to mass-produce wood that's stronger than steel (techcrunch.com)

Ditching Obsidian and building my own (amberwilliams.io)

Finland announces migration of its rail network to international gauge (yle.fi)

Claude Code SDK (docs.anthropic.com)

Dilbert creator Scott Adams says he will die soon from same cancer as Joe Biden (thewrap.com)

Litestream: Revamped (fly.io)

Mystical (suberic.net)

Gemma 3n preview: Mobile-first AI (developers.googleblog.com)

Game theory illustrated by an animated cartoon game (ncase.me)

Building my own solar power system (medium.com)

MIT asks arXiv to withdraw preprint of paper on AI and scientific discovery (economics.mit.edu)

Show HN: I modeled the Voynich Manuscript with SBERT to test for structure (github.com)

AI in my plasma physics research didn’t go the way I expected (understandingai.org)

After months of coding with LLMs, I'm going back to using my brain (albertofortin.com)

AniSora: Open-source anime video generation model (komiko.app)

Ollama's new engine for multimodal models (ollama.com)

If nothing is curated, how do we find things (tadaima.bearblog.dev)

Show HN: 90s.dev – Game maker that runs on the web (90s.dev)

O2 VoLTE: locating any customer with a phone call (mastdatabase.co.uk)

BuyMeACoffee silently dropped support for many countries (2024) (zverok.space)

The emoji problem (2022) (artofproblemsolving.com)

JavaScript's New Superpower: Explicit Resource Management (v8.dev)

What makes a good engineer also makes a good engineering organization (2024) (moxie.org)

$30 Homebrew Automated Blinds Opener (2024) (sifter.org)

What do wealthy people buy, that ordinary people know nothing about? (2015) (old.reddit.com)

New #1 open-source AI Agent on SWE-bench Verified

Comments (15)