Terence Tao: Applying Red Team / Blue Team Duality to AI Workflows (mathstodon.xyz)

1 points by bertman 45s ago 0 comments

Add AI coding assistant configuration to Linux kernel (lore.kernel.org)

1 points by watusername 1m ago 0 comments

Ambigrammia: Between Creation and Discovery (Hofstadter, 2025) (yalebooks.yale.edu)

1 points by lorenzuru 2m ago 0 comments

Heredocs Can Make Your Bash Scripts Self-Documenting (holdtherobot.com)

1 points by chmaynard 8m ago 0 comments

Neovide: GUI for Neovim with Cool Features (neovide.dev)

2 points by AbuAssar 13m ago 0 comments

The Thermodynamics of Trading (signalsandthreads.com)

1 points by tosh 16m ago 0 comments

Next edit prediction in Neovim (magenta.nvim) (github.com)

1 points by anonymid 19m ago 2 comments

Auto Favicon MCP Server (github.com)

3 points by dh1011 24m ago 0 comments

Heheheh Hh (qdt4wd-ip-178-240-78-75.tunnelmole.net)

1 points by ehehjeje 26m ago 0 comments

When JavaScript Decided My Day Starts at 9AM (senhongo.com)

1 points by SenHeng 29m ago 0 comments

Kind of Confusing (aeon.co)

1 points by jruohonen 33m ago 0 comments

Acompanhador+de+ExercíCIOs (blog.nilo.pro.br)

1 points by morrison27 35m ago 0 comments

First You Create the Work and Then the Work Creates You: Nietzsche's Life (ristonthomas.com)

1 points by paul_riston 38m ago 1 comments

I Hate, Therefore I Am (nytimes.com)

2 points by Michelangelo11 40m ago 0 comments

Simon Tatham's Portable Puzzle Collection (chiark.greenend.org.uk)

2 points by sogen 40m ago 0 comments

Tokamak fusion reactor turns mercury into gold (newatlas.com)

1 points by geox 40m ago 0 comments

3I/Atlas: Another Interstellar Visitor – Just as AI Reaches AGI Threshold?

1 points by 5F7bGnd6fWJ66xN 41m ago 0 comments

Turn any diagram image into an editable Draw.io file. No more redrawing (imagetodrawio.com)

3 points by matthewshere 50m ago 1 comments

Show HN: Play Flash Games on Mobile with Ruffle Virtual Keyboard (github.com)

2 points by ed253 55m ago 0 comments

Memories of a litigator on the Hulk Hogan vs. Gawker case (twitter.com)

2 points by Michelangelo11 1h ago 0 comments

We Gained Full Access to a $100M Zero-Trust Startup (zero-defense.com)

1 points by uponasmile 1h ago 0 comments

Trump Administration Plans Changes to Skilled Worker Visas and Citizenship Tests (nytimes.com)

2 points by duncangh 1h ago 1 comments

Learning About GPUs Through Measuring Memory Bandwidth (evolvebenchmark.com)

2 points by thunderbong 1h ago 0 comments

Nexus – Lovable for Desktop Apps (github.com)

1 points by xtrkil 1h ago 1 comments

LaraCopilot – AI Powered Laravel Copilot (laracopilot.com)

2 points by ethanleetech 1h ago 0 comments

First release candidate of systemd 258 is here (theregister.com)

4 points by beardyw 1h ago 0 comments

What if Left-Right and Seq-Lock had a baby? (github.com)

2 points by qbolec2000 1h ago 0 comments

AI as Profoundly Abnormal Technology (blog.ai-futures.org)

2 points by reasonableklout 1h ago 0 comments

Playtron's Linux-Based GameOS Hits the Road with 1.0 (boilingsteam.com)

3 points by ekianjo 1h ago 0 comments

Why I Do Programming (esafev.com)

3 points by artmare 1h ago 0 comments

Show HN: I loaded all Paul Grahams essays into an AI Avatar (humanconscious.com)

2 points by stewlabs 1h ago 3 comments

NASA TRACERS mission contains D code (forum.dlang.org)

1 points by mhh__ 1h ago 0 comments

Tesla dumped 75% of its Bitcoin at one of the worst times, losing billions (cnbc.com)

9 points by MilnerRoute 1h ago 4 comments

Science is winning the Tour de France (msn.com)

1 points by skruger 1h ago 0 comments

Textreme – A Juicy Text Processor (ash-k.itch.io)

1 points by jfil 1h ago 1 comments

Three HTTP versions later, forms are still a mess (yorickpeterse.com)

3 points by thunderbong 2h ago 0 comments

Middleware for humans – run this code every time you pass through a door (old.reddit.com)

1 points by fcpguru 2h ago 0 comments

Crew Members Are Hurt as Southwest Plane Plunges Abruptly After Takeoff (nytimes.com)

2 points by perihelions 2h ago 0 comments

The Substack AI Report (on.substack.com)

2 points by Garbage 2h ago 0 comments

Where People Go When They Want to Hack You [video] (youtube.com)

1 points by mgh2 2h ago 0 comments

How Stack Overflow is innovating to keep up with AI disruption (leaddev.com)

2 points by Garbage 2h ago 0 comments

A colonial hangover or a leg-up? India grapples with the appeal of English (cnn.com)

3 points by rawgabbit 2h ago 1 comments

Nvidia's AI said to 'update' to old drivers and that I wasn't gaming (I was) (pcgamer.com)

1 points by josephcsible 2h ago 0 comments

Object deserialization attacks using Ruby's Oj JSON parser (hezmatt.org)

1 points by pabs3 3h ago 0 comments

First Open source cognition layer for your shell

1 points by atharvemalviya 3h ago 0 comments

Users claim Discord's age verification can be tricked with video game characters (thepinknews.com)

18 points by mediumdeviation 3h ago 7 comments

The Hack That Made China a Superpower: Operation Shady Rat [video] (youtube.com)

1 points by mgh2 3h ago 0 comments

China behind global hack involving multiple US agencies (politico.com)

5 points by bdev12345 3h ago 0 comments

Show HN: A custom ESP32-S3 board for battery powered WiFi enabled plant watering (github.com)

2 points by LarsDu88 3h ago 0 comments

A grand tour through the essays of Lewis H. Lapham (laphamsquarterly.org)

2 points by samclemens 3h ago 0 comments

Show HN: Mini-swe-agent achieves 65% on SWE-bench in 100 lines of python

5 lieret 2 7/25/2025, 1:27:29 PM github.com ↗

Comments (2)

lieret · 17h ago

In 2024, we developed SWE-bench and SWE-agent at Princeton University and helped kickstart the coding agent revolution.

Back then, LMs were optimized to be great at chatting, but not much else. This meant that agent scaffolds had to get very creative (and complicated) to make LMs perform useful work.

But in 2025, LMs are actively optimized for agentic coding, and we ask:

*What the simplest coding agent that could still score near SotA on the benchmarks?*

*Turns out, it just requires 100 lines of code!*

And this system still *resolves 65% of all GitHub issues in the SWE-bench verified benchmark* with Sonnet 4 (for comparison, when Anthropic launched Sonnet 4, they reported 70% with their own scaffold that was never made public).

Honestly, we're all pretty stunned ourselves—we've now spent more than a year developing SWE-agent, and would not have thought that such a small system could perform nearly as good.

I'll link to the project below (all open-source, of course). The hello world example is incredibly short & simple (and literally what gave us the 65%). But it is also meant as a serious command line tool + research project, so we provide a Claude-code style UI & some utilities on top of that.

We have some team members from Princeton/Stanford here today, let us know if you have any questions/feedback :)

Oras · 16h ago

Is there an option to learn from mistakes? most coding agents I tried, including the Sonnet 4 based one will make same mistake again and again in a new chat.

It would be great to have the agent adding a memory (even locally) to avoid mistakes, checking for new versions of libraries, and write a list of tasks first before the execution (similar to Kiro and Trae SOLO).