“This telegram must be closely paraphrased before being communicated to anyone” (history.stackexchange.com)

Hi there, me and some friends were inspired by Simon Willison's recent post on the "lethal trifecta" (https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/ ) and started building a gateway to defend against it.

The idea: instead of connecting an LLM directly to multiple MCP servers, you point them all through a Gateway.

The Gateway:

- Connects to each MCP server and inspects their tools + requirements

- Classifies tools along the "trifecta" axes (private data access, untrusted content, external comms)

- When all three conditions are about to align in a single session, the Gateway blocks the last step and tells the LLM to show a warning instead.

That way, before anything dangerous can happen, the user is nudged to review the situation in a web dashboard.

We'd love for the HN community to try it out: https://github.com/Edison-Watch/open-edison

Any feedback very welcome - we'll be around in the thread to answer questions.

Comments (14)

bradleybuda · 52m ago

I think the "lethal trifecta" framing is useful and glad that attempts are being made at this! But there are two big, hard-to-solve problems here:

1. The "lethal trifecta" is also the "productive trifecta" - people want to be able to use LLMs to operate in this space since that's where much of the value is; using private / proprietary data to interact with (do I/O with) the real world.

2. I worry that there will soon be (if not already) a fourth leg to the stool - latent malicious training within the LLMs themselves. I know the AI labs are working on this, but trying to ferret out Manchurian Candidates embedded within LLMs may very well be the greatest security challenge of the next few decades.

76SlashDolphin · 26m ago

Those are really good points and we do have some plans for them, mainly on the first topic. What we're envisioning in terms of UX for our gateway is that when you set it up it's very defensive but whenever it detects a trifecta, you can mark it as a false positive. Over time the gateway will be trained to be exactly as permissive as the user wishes with only the rare false positive. You can already do that with the gateway today (you get a web notification when the gateway detects a trifecta and if you click into it, you get taken to a menu to approve/deny it if it occurs in the future). Granted, this can make the gateway overly-permissive but we do have plans on how to improve the granularity of these rules.

Regarding the second point, that is a very interesting topic that we haven't thought about. It would seem that our approach would work for this usecase too, though. Currently, we're defending against the LLM being gullible but gullible and actively malicious are not properties that are too different. It's definitely a topic on our radar now, thanks for bringing it up!

aaronharnly · 1h ago

"without risk", "solves", and "Guaranteed" are big words – you might want to temper them.

76SlashDolphin · 1h ago

Fair criticism! We wrote the Readme earlier on when we were still ironing out the requirements. I'll fix it up shortly.

noddingham · 1h ago

Agreed. If someone could help answer the question of "how" I'd appreciate it. I'm currently skeptical but not sure I'm knowledgeable enough to prove myself right or wrong.

But, it just seems to me that some of the 'vulnerabilities' are baked in from the beginning, e.g. control and data being in the same channel AFAIK isn't solvable. How is it possible to address that at all? Sure we can do input validation, sanitization, restrict access, etc. ,etc., and a host of other things but at the end of the day isn't it still non-zero chance that something is exploited and we're just playing whack-a-mole? Not to mention I doubt everyone will define things like "private data" and "untrusted" the same. uBlock tells me when a link is on one of it's lists but I still click go ahead anyways.

76SlashDolphin · 1h ago

At least in its current state we just use an LLM to categorise each individual tool. We don't look at the data itself, although we have some ideas of how to improve things, as currently it is very "over-defensive". For example, if you have the filesystem MCP and a web search MCP, open-edison will block if you perform a filesystem read, a web search, and then a filesystem write. Still, if you rarely perform writes open-edison would still be useful for tracking things. The UX is such that after an initial block you can make an exception for the same flow the next time it occurs.

noddingham · 41m ago

Thanks for the follow up. I can see the value in trying to look at the chained read - search - write or similar patterns to alert the user. Awareness of tool activity is definitely helpful.

daveguy · 1h ago

Well, I guess 80-90% protective is better than nothing. Better might be a lock that requires positive confirmation by the user.

76SlashDolphin · 48m ago

It is possible to configure it like that - when a trifecta is detected, it is possible for the gateway to wait for confirmation before allowing the last MCP call to proceed. The issue with that MCP clients are still in early stages and some of them don't like waiting for a long time until they get a response and act in weird or inconvenient ways if something times out (some of them sensibly disable the entire server if a single tool times out, which in our case disables the entire gateway and therefore all MCP tools). As it is, it's much better to default to returning a block message, and emit a web notification from the gateway dashboard to get the user to approve the usecase, then rerun their previous prompt.

doctoboggan · 52m ago

Wouldn't the LLM running in the gateway also be susceptible to the same jailbreaks?

76SlashDolphin · 35m ago

That's a good question! We do use an LLM to categorise the MCP tools but that is at "add" or "configure" time, not at the time they are called. As such we don't actively run an LLM while the gateway is up, all the rules are already set and requests are blocked based on the hard-set rules. Plus, at this point we don't actually look at the data that is passed around, so even if we change the rules for the trifecta, there's no way for any LLM to be poisoned by a malicious actor feeding bad data.

8note · 9m ago

couldnt the configuring LLM be poisoned by tool descriptions to grant the lethal trifecta to the run time LLM?

76SlashDolphin · 3m ago

It is possible thay a malicious MCP could poison the LLM's ability to classify it's tools but then your threat model includes adding malicious MCPs which would be a problem for any MCP client. We are considering adding a repository of vetted MCPs (or possibly use one of the existing ones) but, as it is, we rely on the user to make sure that their MCPs are legitimate.

warthog · 2h ago

Seen a hack using whatsapp mcp recently - this seems promising

We should have the ability to run any code we want on hardware we own (hugotunius.se)

Cognitive load is what matters (github.com)

NPM debug and chalk packages compromised (aikido.dev)

I didn't bring my son to a museum to look at screens (sethpurcell.com)

I ditched Docker for Podman (codesmash.dev)

30 minutes with a stranger (pudding.cool)

Germany is not supporting ChatControl – blocking minority secured (digitalcourage.social)

Show HN: Term.everything – Run any GUI app in the terminal (github.com)

Charlie Kirk killed at event in Utah (nbcnews.com)

996 (lucumr.pocoo.org)

Next.js is infuriating (blog.meca.sh)

Show HN: I recreated Windows XP as my portfolio (mitchivin.com)

The MacBook has a sensor that knows the exact angle of the screen hinge (twitter.com)

Anthropic agrees to pay $1.5B to settle lawsuit with book authors (nytimes.com)

Signal Secure Backups (signal.org)

Using Claude Code to modernize a 25-year-old kernel driver (dmitrybrant.com)

iPhone Air (apple.com)

Pontevedra, Spain declares its entire urban area a "reduced traffic zone" (greeneuropeanjournal.eu)

Google can keep its Chrome browser but will be barred from exclusive contracts (cnbc.com)

I replaced Animal Crossing's dialogue with a live LLM by hacking GameCube memory (joshfonseca.com)

We all dodged a bullet (xeiaso.net)

Stripe Launches L1 Blockchain: Tempo (tempo.xyz)

Mistral raises 1.7B€, partners with ASML (mistral.ai)

Chat Control Must Be Stopped (privacyguides.org)

New Mexico is first state in US to offer universal child care (governor.state.nm.us)

“This telegram must be closely paraphrased before being communicated to anyone” (history.stackexchange.com)

Almost anything you give sustained attention to will begin to loop on itself (henrikkarlsson.xyz)

Where's the shovelware? Why AI coding claims don't add up (mikelovesrobots.substack.com)

Google AI Overview made up an elaborate story about me (bsky.app)

iPhone dumbphone (stopa.io)

Claude Code: Now in Beta in Zed (zed.dev)

Eternal Struggle (yoavg.github.io)

KDE launches its own distribution (lwn.net)

ICE is using fake cell towers to spy on people's phones (forbes.com)

Claude now has access to a server-side container environment (anthropic.com)

Court rejects Verizon claim that selling location data without consent is legal (arstechnica.com)

I'm absolutely right (absolutelyright.lol)

LLM Visualization (bbycroft.net)

Notes on Managing ADHD (borretti.me)

Why our website looks like an operating system (posthog.com)

Serverless Horrors (serverlesshorrors.com)

MIT Study Finds AI Use Reprograms the Brain, Leading to Cognitive Decline (publichealthpolicyjournal.com)

E-paper display reaches the realm of LCD screens (spectrum.ieee.org)

No adblocker detected (maurycyz.com)

The maths you need to start understanding LLMs (gilesthomas.com)

Wikipedia survives while the rest of the internet breaks (theverge.com)

AI surveillance should be banned while there is still time (gabrielweinberg.com)

Fil's Unbelievable Garbage Collector (fil-c.org)

Anthropic raises $13B Series F (anthropic.com)

We already live in social credit, we just don't call it that (thenexus.media)

Show HN: An MCP Gateway to block the lethal trifecta

Comments (14)