Fast (catherinejue.com)

At the core of it, it's just an LLM running in a loop calling tools... but when you try to do this naively (or at least, when I try to do it) the LLM struggles with doing long/complex tasks

So how do these other agents accomplish it?

These agents all do similar things, namely:

1. They use a planning tool

2. They use sub agents

3. They use a file system like thing to offload context

4. They have a detailed system prompt (prompting isn't dead!)

I don't think any of these things individually is novel... but I also think that they are not super common place to do when building agents. And the combination of them is (I think) an interesting insight!

Would love any feedback :)

web-cowboy · 1h ago

As I think through this, I agree with others mentioning that "deep agents" still sounds a lot like agents+tools. I guess the takeaway for me is:

1. You need a good LLM for base knowledge.

2. You need a good system prompt to guide/focus the LLM (create an agent).

3. If you need some functionality that doesn't make any decisions, create a tool.

4. If the agent + tools flows get too wily, break it down into smaller domains by spawning sub agents with focused prompts and (less?) tools.

_andrei_ · 1h ago

ah, deep agents = agents with planning + agents as tools => so regular agents.

i hate how LangChain has always tried to make things that are simple seem very complicated, and all the unnecessary new terminology and concepts they've pushed, but whatever sells LangSmith.

manx · 1h ago

I'm also in the process of creating a general purpose agent cli+library in rust: https://github.com/fdietze/alors

Still work in progress, but I'm already using it to code itself. Feedback welcome.

shmatt · 3h ago

At least from what I noticed - Junie from Jetbrains was the first to use a very high quality to do list, and it quickly became my favorite

I haven't used it since it became paid, but back then Junie was slow and thoughtful, while Cursor was constantly re-writing files that worked fine, and Claude was somewhere in the middle

tough · 2h ago

Cursor added a UI for todo list and encourages it's agent to use it (its great ux, but you can't really see a file of it)

kiro from amazon does both tasks (in tasks.md) and specs.

Too many tools soon, choose what works for you

jayshah5696 · 2h ago

sub agents adding isolating context is the real deal rest is just langgraph react agent

PantaloonFlames · 52m ago

This is valuable but not really a novel idea.

storus · 1h ago

"I hacked on an open source package (deepagents) over the weekend." Thanks but no thanks.

epolanski · 53m ago

Some of the biggest software in use today was hacked over few days in its first versions. Git is a famous one.

yawnxyz · 1h ago

most of these agents are still fundamentally simple while loops; it shouldn't really take longer than a weekend to get one built

SCUSKU · 1h ago

Hacker hacks on project and gets posted to Hacker News. Commenter on Hacker News: No thanks, no hacking please.

storus · 1h ago

It's on langchain's official page, a framework that looks like it was hacked over the weekend by a fresh grad that brought a lot of pain to the agentic development, and this just feels like piling up more pain on it.

seabass · 3h ago

Is there more info on how the todo list tool is a noop? How exactly does that work?

crawshaw · 2h ago

If you want to see it in action in some code, our agent Sketch uses a TODO list tool: https://github.com/boldsoftware/sketch/blob/main/claudetool/...

It is relatively easy to get the agent to use it, most of the work for us is surfacing it in the UI.

lmeyerov · 2h ago

i think he means it's 'just' a thin concat

most useful prompt stuff seems 'simple' to implement ultimately, so it's more impressive to me that such a simple idea of TODO goes so far!

(agent frameworks ARE hard in serious settings, don't get me wrong, just for other reasons. ex: getting the right mix & setup devilishly hard, as are infra layers below like multitenacy, multithreading, streaming, cancellation, etc.)

re: the TODO list, strong agree on criticality. it's flipped how we do louie.ai for stuff like speed running security log analysis competitions. super useful for preventing CoT from going off the rails after only a few turns.

a fun 'aha' for me there: nested todo's are great (A.2.i...), and easy for the LLM b/c they're linearized anyways

You can see how we replace claude code's for our own internal vibe coding usage, which helps with claude's constant compactions as a heavy user (= assuages issue of the ticking timer for a lobotomy): https://github.com/graphistry/louie-py/blob/main/ai/prompts/...

JyB · 3h ago

Same question. I don’t understand what they mean by that. It obviously seem pretty central to how Claude Code is so effective.

kjhughes · 3h ago

I thought they meant that it's a noop as a tool in the sense that it takes no external action. It seems nonetheless effective as a means of organizing reasoning and expressing status along the way.

kobstrtr · 3h ago

just for chain of thought TodoWrite would be sufficient as a tool wouldn‘t it?

TrainedMonkey · 2h ago

My understanding is that it is basically a prompt about making a TODO list.

ttul · 2h ago

The context will contain a record that the tool call took place. The todo list is never actually fetched.

kobstrtr · 3h ago

if it was a noop, I feel like there wouldn‘t be a need to have TodoRead as a tool, since TodoWrite exists. Would love to get more info on whether this is really a noop

aabhay · 3h ago

My guess is the todo list is carried across “compress” points where the agent summarizes and restarts with fresh context + the summary

manishsharan · 2h ago

I have been following along the code in this repo. https://github.com/ghuntley/claude-code-source-code-deobfusc...

The author has done a pretty good job of reverse engineering Claude Code and explaining the architecture.

update: changed the link to a better repo

cjonas · 1h ago

Can you explain what I'm looking at. Just appears to be a massive readme with a bunch of system instructions?

manishsharan · 54m ago

My apologies

This is a better repo to learn about Claude code internals

https://github.com/ghuntley/claude-code-source-code-deobfusc...

Fast (catherinejue.com)

Do not download the app, use the website (idiallo.com)

Study mode (openai.com)

Copyparty – Turn almost any device into a file server (github.com)

EU age verification app to ban any Android system not licensed by Google (reddit.com)

Dumb Pipe (dumbpipe.dev)

‘I witnessed war crimes’ in Gaza – former worker at GHF aid site [video] (bbc.com)

Enough AI copilots, we need AI HUDs (geoffreylitt.com)

Performance and telemetry analysis of Trae IDE, ByteDance's VSCode fork (github.com)

Slow (michaelnotebook.com)

M8.7 earthquake in Western Pacific, tsunami warning issued (earthquake.usgs.gov)

Show HN: Use Their ID – Use your local UK MP’s ID for the Online Safety Act (use-their-id.com)

Graphene OS: a security-enhanced Android build (lwn.net)

It's time for modern CSS to kill the SPA (jonoalderson.com)

Our $100M Series B (oxide.computer)

Vibe code is legacy code (blog.val.town)

Show HN: Draw a fish and watch it swim with the others (drawafish.com)

‘No Other Land’ consultant Awdah Hathaleen killed by Israeli settler (latimes.com)

Face it: you're a crazy person (experimental-history.com)

VPN use surges in UK as new online safety rules kick in (ft.com)

Windsurf employee #2: I was given a payout of only 1% what my shares where worth (twitter.com)

Tom Lehrer has died (nytimes.com)

Steam, Itch.io are pulling ‘porn’ games. Critics say it's a slippery slope (wired.com)

Sleep all comes down to the mitochondria (science.org)

Visa and Mastercard are getting overwhelmed by gamer fury over censorship (polygon.com)

Rust running on every GPU (rust-gpu.github.io)

Claude Code weekly rate limits

My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air) (simonwillison.net)

Women dating safety app 'Tea' breached, users' IDs posted to 4chan (404media.co)

Sign in with Google in Chrome (underpassapp.com)

Ollama's new app (ollama.com)

Tao on “blue team” vs. “red team” LLMs (mathstodon.xyz)

4k NASA employees opt to leave agency through deferred resignation program (kcrw.com)

Corporation for Public Broadcasting ceasing operations (cpb.org)

How was the Universal Pictures 1936 opening logo created? (movies.stackexchange.com)

iPhone 16 cameras vs. traditional digital cameras (candid9.com)

The anti-abundance critique on housing is wrong (derekthompson.org)

MacBook Pro Insomnia (manuel.bernhardt.io)

Irrelevant facts about cats added to math problems increase LLM errors by 300% (science.org)

CCTV footage captures video of an earthquake fault in motion (smithsonianmag.com)

When we get Komooted (bikepacking.com)

The future is not self-hosted (drewlyton.com)

Many countries that said no to ChatControl in 2024 are now undecided (digitalcourage.social)

There is no memory safety without thread safety (ralfj.de)

LLM Embeddings Explained: A Visual and Intuitive Guide (huggingface.co)

Live coding interviews measure stress, not coding skills (hadid.dev)

I designed my own fast game streaming video codec – PyroWave (themaister.net)

It's a DE9, not a DB9 (but we know what you mean) (news.sparkfun.com)

RIP Shunsaku Tamiya, the man who made plastic model kits a global obsession (japanesenostalgiccar.com)

Try the Mosquito Bucket of Death (energyvanguard.com)

Deep Agents

Comments (28)