Don't Build Multi-Agents

75 JnBrymn 52 9/1/2025, 9:54:56 PM cognition.ai ↗

Comments (52)

colonCapitalDee · 1h ago

I'm building a simple agent accessible over SMS for a family member. One of their use cases is finding recipes. A problem I ran into was that doing a web search for recipes would pull tons of web pages into the context, effectively clobbering the system prompt that told the agent to format responses in a manner suited for SMS. I solved this by creating a recipe tool that uses a sub-agent to do the web search and return the most promising recipe to the main agent. When the main agent uses this tool instead of performing the web search itself, it is successfully able to follow the system prompt's directions to format and trim the recipe for SMS. Using this sub-agent to prevent information from entering the context dramatically improved the quality of responses. More context is not always better!

I bring this up because this article discusses context management mostly in terms of context windows having a maximum size. I think that context management is far more than that. I'm still new to this building agents thing, but my experience suggests that context problems start cropping up well before the context window fills up.

_0ffh · 9m ago

You mean sub-agent as in the formatting agent calls on the the search-and-filter agent? In that case you might just make a pipeline. Use a search agent, then a filter agent (or maybe only one search-and-filter agent), then a formatting agent. Lots of tasks work better with a fixed pipeline than with freely communicating agents.

edoceo · 1h ago

Are you in USA? How to get around those 10DLC limits on typical SMS/API things (eg Twilio). Or did you go through that process (which seems a lot for a private use-case)

colonCapitalDee · 1h ago

I am in the USA! Although these days that exclamation point doesn't feel great...

I'm using an old Android phone (Pixel 2 from 2017), a 5$ a month unlimited SMS plan from Tello, and https://github.com/capcom6/android-sms-gateway. For bonus points (I wanted to roll my own security, route messages from different numbers to prod and ppe instances of my backend, and dedup messages) I built a little service in Go that acts as an intermediary between my backend and android-sms-gateway. I deploy this service to my android device using ADB, android-sms-gateway talks to it, and it talks to my backend. I also rooted the android device so I could disable battery management for all apps (don't do this if you want to walk around with the phone of course). It works pretty well!

I plan to open-source this eventually TM, but first I need to decouple my personal deployment infra from the bits useful to everyone else

jmull · 10m ago

> By using React, you embrace building applications with a pattern of reactivity and modularity, which people now accept to be a standard requirement, but this was not always obvious to early web developers.

This is quite a whopper. For one thing, the web started off reactive. It did take a while for a lot of people to figure out how to bring that to client-side rendering in a reasonably decent way (though, I'm sorry, IMO that doesn't actually include react). Second, "modularity" has been a thing for quite some time before the web existed. (If you want to get down to it, separating and organizing and your processes in information systems predate computers.)

faangguyindia · 1h ago

There’s both “no multi-agent system” and “multi-agent system,” depending on how you look at it. In reality, you’re always hitting the same /chat/completion API, itself has no awareness of any agents. Any notion of an agent comes purely from the context and instructions you provide.

Separating agents, has a clear advantage. For example, suppose you have a coding agent with a set of rules for safely editing code. Then you also have a code search task, which requires a completely different set of rules. If you try to combine 50 rules for code editing with 50 rules for code searching, the AI can easily get confused.

It’s much more effective to delegate the search task to a search agent and the coding task to a code agent. Think of it this way: when you need to switch how you approach a problem, it helps to switch to a different “agent”, a different mindset with rules tailored for that specific task.

Do i need to think differently about this problem? if yes, you need a different agent!

So yes, conceptually, using separate agents for separate tasks is the better approach.

eab- · 3m ago

There's both "no multi-program system" and "multi-program system", depending on how you look at it. In reality, you're always executing the same machine code, itself has no awareness of programs.

datadrivenangel · 19m ago

Calling a different prompt template an 'agent' doesn't help communicate meaningful details about an overall system design. Unnecessary verbiage or abstraction in this case.

CuriouslyC · 1h ago

We're in the context engineering stone age. You the engineer shouldn't be trying to curate context, you should be building context optimization/curation engines. You shouldn't be passing agents context like messages, they should share a single knowledge store with the parent, and the context optimizer should just optimally pack their context for the task description.

hansvm · 52m ago

You're not wrong. This is just a storage/retrieval problem. But ... the current systems have limits. If you want commercial success in <3yrs, are any of those ideas remotely viable?

CuriouslyC · 48m ago

Oh yeah, and if you tried to do one now it'd be a bad idea because I'm almost done :)

The agentic revolution is very different from the chatbot/model revolution because agents aren't a model problem, they're a tools/systems/process problem. Honestly the models we have now are very close to good enough for autonomous engineering, but people aren't giving them the right tools, the right processes, we aren't orchestrating them correctly, most people have no idea how to benchmark them to tune them, etc. It's a new discipline and it's very much in its infancy.

nickreese · 1h ago

Is there a framework for this?

CuriouslyC · 52m ago

I have one that's currently still cooking, I have good experimental validation for it but I need to go back and tune the latency and improve the install story. It should help any model quite a bit but you have to hack other agents to integrate it into their api call machinery, I have a custom agent I've built that makes it easy to inject though.

jrvarela56 · 55m ago

Ive used CrewAI to compose agents, it’s easy to mix and match and it does seem to change context based on roles https://docs.crewai.com/en/guides/agents/crafting-effective-...

sippeangelo · 1h ago

"should just"?

CuriouslyC · 55m ago

It's really not hard. It's just all the IR/optimization machinery we already have applied to a shared context tree with locality bias.

ramesh31 · 1m ago

Don't listen to anyone who tells you how to build an agent. This stuff has never existed before in the history of the world, and literally everyone is just figuring it out as we go. Work from the simplest basic building blocks possible and do what works for your use case. Eventually things will be figured out, and you can worry about "best practices" then. But it's all just conjecture right now.

avereveard · 12m ago

"I designed a bad system so all system of these class must be bad"

They really handing out ai domain to anyone these days.

curl-up · 1h ago

In the context compression approach, why aren't the agents labeled as subagents instead? The compressed context is basically a "subtask".

This is my main issue with all these agentic frameworks - they always conviniently forget that there is nothing "individual" about the thing they label "an agent" and draw a box around.

Such "on demand" agents, spawned directly from previos LLM output, are never in any way substantially different from dynamic context compression/filtering.

I think the only sensible framework is to think in terms of tools, with clear interfaces, and a single "agent" (single linear interaction chain) using those tools towards a goal. Such tools could be LLM-based or not. Forcing a distinction between a "function tool" and an "agent that does somethng" doesn't make sense.

peab · 1h ago

Yeah, i agree with thinking of things as a single agent + tools.

From the perspective of the agent, whether the tools are deterministic functions, or agents themselves, is irrelevant.

jskalc92 · 1h ago

I think the most common implementation of "subagents" doesn't get full context of a conversation, rather just an AI-generated command.

Here task is fullfilled with the full context so far, and then compressed. Might work better IMO.

adastra22 · 1h ago

In my experience it does not work better. There are two context related values to subagent tool calls: (1) the subagent trials and deliberations don’t poison the callers context [this is a win here]; and (2) the called agent isn’t unduly influenced by the caller’s context. [problem!]

The latter is really helpful for getting a coding assistant to settle on a high quality solution. You want critic subagents to give fresh and unbiased feedback, and not be influenced by arbitrary decisions made so far. This is a good thing, but inheriting context destroys it.

adastra22 · 1h ago

> As of June 2025, Claude Code is an example of an agent that spawns subtasks. However, it never does work in parallel with the subtask agent, and the subtask agent is usually only tasked with answering a question, not writing any code.

Has this changed since June? Because I’ve been experimenting over the last month with Claude Cide subagents that work in parallel and agents which write code (doing both simultaneously is inadvisable for obvious reasons, at least without workspace separation).

clbrmbr · 49m ago

I’ve been quite successful since June doing parallel edits just on different components within the same codebase. But I’ve not been able to do it with “auto-accept” because I need a way to course correct if one of the agents goes off the rails.

mreid · 1h ago

Is it concerning to anyone else that the "Simple & Reliable" and "Reliable on Longer Tasks" diagrams look kind of like the much maligned waterfall design process?

CuriouslyC · 1h ago

Waterfall is just a better process with agents. Agile is garbage when inserting yourself in the loop causes the system to drop to 10% velocity.

amelius · 1h ago

It looks more like alchemy, thb.

DarkNova6 · 1h ago

To me it seems more like the typical trap of a misfit bounded context.

sputknick · 1h ago

This is very similar to the conclusion I have been coming to over the past 6 months. Agents are like really unreliable employees, that you have to supervise, and correct so often that its a waste of time to delegate to them. The approach I'm trying to develop for myself is much more human centric. For now I just directly supervise all actions done by an AI, but I would like to move to something like this: https://github.com/langchain-ai/agent-inbox where I as the human am the conductor of work agents do, then check in with me for further instructions or correction.

codelion · 1h ago

Whom to believe? Devin or Claude? - https://www.anthropic.com/engineering/multi-agent-research-s...

nextworddev · 1h ago

Anecdotally Devin has been one of the worst coding agents I tried, to the point where I didn’t even bother asking for my unused credits to be refunded. That was 2 months ago, so things may have changed.

wewtyflakes · 1h ago

This resonates heavily with our experience. We ended up using one agent + actively managed context, with the smartness baked into how we manage that context for that one agent, rather than attempting to manage expectations/context across a team of agents.

skissane · 58m ago

> Principle 1: Share context, and share full agent traces, not just individual messages

I was playing around with this task: give a prompt to a low-end model, get the response, and then get the higher-end model to evaluate the quality of the response.

And one thing I've noticed, is while sometimes the higher-end model detects when the low-end model is misinterpreting the prompt (e.g. it blatantly didn't understand some aspect of it and just hallucinated), it still often allows itself be controlled by the low-end model's framing... e.g. if the low-end model takes a negative attitude to an ambiguous text, the high-end model will propose moderating the negativity... but the thing it doesn't realise, is if given the prompt without the low-end model's response, it might not have adopted that negative attitude at all.

So one idea I had... a tool which enables the LLM to get its own "first impression" of a text... so it can give itself the prompt, and see how it would react to it without the framing of the other model's response, and then use that as additional input into its evaluation...

So this is an important point this post doesn't seem to understand – sometimes less is more, sometimes leaving stuff out of the context is more useful than putting it in

> It turns out subagent 1 actually mistook your subtask and started building a background that looks like Super Mario Bros. Subagent 2 built you a bird, but it doesn’t look like a game asset and it moves nothing like the one in Flappy Bird. Now the final agent is left with the undesirable task of combining these two miscommunications

It seems to me there is another way to handle this... allow the final agent to go back to the subagent and say "hey, you did the wrong thing, this is what you did wrong, please try again"... maybe with a few iterations it will get it right... at some point, you need to limit the iterations to stop an endless loop, and either the final agent does what it can with a flawed response, or escalate to a human for manual intervention (even the human intervention can be a long-running tool...)

ggandhi · 50m ago

"It is now 2025 and React (and its descendants) dominates the way developers build sites and apps." Is there any research which tells react is dominating or most of the internet is not vanilla HTML but react?

CityOfThrowaway · 5m ago

There's two ways to answer this:

1. On one hand, walled gardens like Facebook, Instagram, YouTube, etc, are most of the internet and they decidedly use React (or similar) frameworks. So from that perspective, the statement is sorta trivially true.

2. There may well be a horde of websites that are pure HTML rendering. But, those sites are not largely being developed by developers – they are being generated by platforms (like Squarespace, etc.) so are out of the scope of "sites and apps built by developers"

All this is stated without data, of course.

behnamoh · 2h ago

How is this fundamentally any different than Erlang/Elixir concepts of supervisors controlling their child processes? It seems like the AI industry keeps re-discovering several basic techniques that have been around since the 80s.

I'm not surprised—most AI "engineers" are not really good software engineers; they're often "vibe engineers" who don't read academic papers on the subject and keep re-inventing the wheel.

If someone asked me why I think there's an AI bubble, I'd point exactly to this situation.

madrox · 1h ago

Software engineering in general is pretty famous for unironically being disdainful of anything old while simultaneously reinventing the past. This new wave is nothing new in that regard.

I'm not sure that means the people who do this aren't good engineers, though. If someone rediscovers something in practice rather than through learning theory, does that make them bad at something, or simply inexperienced? I think it's one of the strengths of the profession that there isn't a singular path to reach the height of the field.

ramchip · 1h ago

I've done a lot of Erlang and I don't see the relation? Supervisors are an error isolation tool, they don't perform the work, break it down, combine results, or act as a communication channel. It's kind of the point that supervisors don't do much so they can be trusted to be reliable.

jll29 · 1h ago

yes, people re-discover stuff, mostly beacause no-one reads older papers. I also thought of Erlang and OAA.

In the early 2000s, we used Open Agent Architecture (OAA) [1], which had a beautiful (declarative) Prolog-like notation for writing goals, and the framework would pick & combine the right agents (all written in different languages, but implementing the OAA interface through proxy libraries) to achieve the specified goals.

This was all on boxes within the same LAN, but conceptually, this could have been generalized.

[1] https://medium.com/dish/75-years-of-innovation-open-agent-ar...

WalterSear · 1h ago

Apart from requiring entirely the opposite solution?

With respect, if there's an AI bubble, I can't see it for all the sour grapes, every time it's brought up, anywhere.

antonvs · 1h ago

A lot of it seems to be resistance to change. People are afraid their skillset may be losing relevance, and instead of making any effort to adapt, they try to resist the change.

WalterSear · 33m ago

I suspect there's more to it than that. Some people are sprinting with this stuff, but it seems that many more are bouncing off it, bruised. It's a tell. Something is different.

It's an entirely new way of thinking, nobody is telling you the rules of the game. Everything that didn't work last month works this month, and everything you learned two months ago, you need to throw away. Coding assistants are inscrutable, overwhelming and bristling with sharp edges. It's easier than ever to paint yourself into a corner.

Back when it took weeks to put out a feature, you were insulated from the consequences of bad architecture, coding and communication skills: by the time things get bad enough to be noticed, the work had been done months ago and everyone on the team had touched the code. Now you can seeing the consequences of poor planning, poor skills, poor articulation being run to their logical conclusion in an afternoon.

I'm sure there are more reasons.

downrightmike · 1h ago

I blame managers that get all giddy about reducing head count. Sure this year, you get a -1% on developer time (seniors believe they get a 20% increase when its really a decrease for using AI)

But then next year and the year after, the technical debt will be to the point where they just need to throw out the code and start fresh.

Then the head count must go up. Typical short term gains for long term losses/bankruptcy

antonvs · 1h ago

> seniors believe they get a 20% increase when its really a decrease for using AI

There’s no good evidence to support that claim. Just one study which looked at people with minimal AI experience. Essentially, the study found that effective use of AI has a learning curve.

Der_Einzige · 1h ago

If you don't use the phrase "structured generation" or "constrained generation" in your discussion of how to build Agents, you're doing it wrong.

photochemsyn · 1h ago

Don't hide your content from people using NoScript, how about that for starters...

And oh great, another Peter Thiel company booted to the top of HN, really?

> "Cognition AI, Inc. (also known as Cognition Labs), doing business as Cognition, is an artificial intelligence (AI) company headquartered in San Francisco in the US State of California. The company developed Devin AI, an AI software developer...Originally, the company was focused on cryptocurrency, before moving to AI as it became a trend in Silicon Valley following the release of ChatGPT... With regards to fundraising, the company was backed by Peter Thiel's Founders Fund which provided $21 million of funding to it in early 2024, valuing the company at $350 million.[2] In April 2024, Founders Fund led a $175 million investment into Cognition valuing the company at $2 billion making it a Unicorn."

The bubble's gonna pop, and you'll have so much egg on your face. This stuff is just compilers with extra compute and who got rich off compilers? VC people...

CuriouslyC · 59m ago

Counterpoint, we'll crack autonomous coding within ~3 years, and with increases in token efficiency the speed of software development is going to go crazy. Project managers and salespeople are gonna take over big tech.

pellmellism · 48m ago

are we calling customers project managers now?

jwpapi · 1h ago

Why is this article on top of HN? this is nothing breaking/new/interesting/astonishing..

The principles are super basic to get the first time you build an agent.

The real problem is to get reliability. If you have reliability and clear defined input and output you can easily go parallel.

THis seems like a bad 5th class homework

makk · 41m ago

It doesn’t feel like news, yeah.

I would emphasize, though, that getting clearly defined input _that remains stable_ is hard. Often something is discovered during implementation of a task that informs changes in other task definitions. A parallel system has to deal with this or the results of the parallel tasks diverge.

ramchip · 51m ago

Personally I found the article informative and well-written. I had been wondering for a while why Claude Code didn't more aggressively use sub-agents to split work, and it wasn't obvious to me (I don't build agents for a living).

clbrmbr · 51m ago

Perhaps because author is perhaps the most prominent agent builder outside of the big labs.

I Miss Using Em Dashes (bassi.li)

Man found dead at Burning Man (bbc.com)

Meta to stop its AI chatbots from talking to teens about suicide (bbc.com)

A Journey to the End of the World (Of Minecraft) (newyorker.com)

Solving LeetCode Problems with Racket (herecomesthemoon.net)

Show HN: A usercript to help you filter "Who's Hiring". (github.com)

Built my own Phone because innovation is sad rn [vídeo] (youtube.com)

Lies We Tell Kids (paulgraham.com)

Protect Your Database from Cursor (github.com)

Doctors develop AI stethoscope that detects major heart conditions in 15 seconds (theguardian.com)

Show HN: Ape – Exercises for Learning How to Build LLM Agents (ape.llm.phd)

Updates to Discord's Policies (discord.com)

Show HN: Cepl – A readline C/C++ REPL with history, tab-completion, and undo (github.com)

Understanding Android's Boot Process

From $4B to Forgotten: The Rise and Fall of Clubhouse (2024)

Event-Tracking Data Synchronization in Soccer Without Annotated Event Locations (arxiv.org)

Developer loses $500K from Open VSX fake Solidity extension (youtube.com)

Show HN: Android Toolkit for Debugging Networks (play.google.com)

Surya: Foundation Model for Heliophysics (github.com)

Year odyssey it took to emulate the Pioneer LaserActive (readonlymemo.com)

Spatial Joins in DuckDB (duckdb.org)

I'm All-In on Server-Side SQLite (fly.io)

Nvidia Says Two Buyers Drove 39% of Q2 Sales (pymnts.com)

Disney and the Decline of America's Middle Class (nytimes.com)

The Collapse of Builder.ai (restofworld.org)

Building a WASM compiler in Roc (series) (dusty.phillips.codes)

Exposing the Monsanto Conspiracy [Veritasium][video] (youtube.com)

Linea Builds Momentum Ahead of Token Launch

Automatic updating Spotify status without JavaScript (lina.sh)

Anthropic to counteract usage of Claude Code for "vibe hacking" (anthropic.com)

55+ Star on new repo in just 3 days of launch, Crazzyyyy (github.com)

AI for Piloting Financial Charts (aulico.com)

Swift on iOS 6 (j-w-i.org)

Lisp (Book) (1989) (en.wikipedia.org)

Amiga Top Picks: UFO: Enemy Unknown [video] (youtube.com)

Regions: A Local-First Agentic Framework (github.com)

Passive Obsession (avabear.xyz)

Collective Shout Seeks Removal of Games, "Even When They Are Not Illegal" (thegamer.com)

Trapped in the Infinite Honey Pot (brianschrader.com)

Behind the Scenes of the 60 Minutes Story about Khan Academy (blog.khanacademy.org)

Retrieval Embedding Benchmark (RTEB) (huggingface.co)

You Are Probably an NPC (gurwinder.blog)

Startup's AI Tech Is Already Live in 911 Call Centers (inc.com)

Ask HN: What are the best tools out there for text to video generation?

Scrambled RNA nudges people towards type 2 diabetes (medicalxpress.com)

'We're winning a battle': Mexico's jaguar numbers up 30% in conservation drive (theguardian.com)

"Natural" Isn't Always Safe: Reframing Herbal-Induced Liver Injury (medscape.com)

Elevated 429 Errors on Cloudflare R2 (cloudflarestatus.com)

The Tinkertoy Computer (retrothing.com)

Extraordinary Popular Delusions and the Madness of Crowds (en.wikipedia.org)

Don't Build Multi-Agents

Comments (52)