Show HN: BaaS to build agents as data, not code (github.com)
5 points by ishita159 1d ago 0 comments
Show HN: Bringing Tech News from HN to My Community (sh4jid.me)
3 points by sh4jid 1d ago 2 comments
MCP overlooks hard-won lessons from distributed systems
250 yodon 140 8/9/2025, 2:42:10 PM julsimon.medium.com ↗
> MCP discards this lesson, opting for schemaless JSON with optional, non-enforced hints. Type validation happens at runtime, if at all. When an AI tool expects an ISO-8601 timestamp but receives a Unix epoch, the model might hallucinate dates rather than failing cleanly. In financial services, this means a trading AI could misinterpret numerical types and execute trades with the wrong decimal precision. In healthcare, patient data types get coerced incorrectly, potentially leading to wrong medication dosing recommendations. Manufacturing systems lose sensor reading precision during JSON serialization, leading to quality control failures.
Having worked with LLMs every day for the past few years, it is easy to see every single one of these things happening.
I can practically see it playing out now: there is some huge incident of some kind, in some system or service with an MCP component somewhere, with some elaborate post-mortem revealing that some MCP server somewhere screwed up and output something invalid, the LLM took that output and hallucinated god knows what, its subsequent actions threw things off downstream, etc.
It would essentially be a new class of software bug caused by integration with LLMs, and it is almost sure to happen when you combine it with other sources of bug: human error, the total lack of error checking or exception handling that LLMs are prone to (they just hallucinate), a bunch of gung-ho startups "vibe coding" new services on top of the above, etc.
I foresee this being followed by a slew of Twitter folks going on endlessly about AGI hacking the nuclear launch codes, which will probably be equally entertaining.
Before 2023 I always thought that all the bugs and glitches of technology in Star Trek were totally made up and would never happen this way.
Post-LLM I am absolutely certain that they will happen exactly that way.
I am not sure what LLM integrations have to do with engineering anymore, or why it makes sense to essentially put all your company's infrastructure into external control. And that is not even scratching the surface with the lack of reproducibility at every single step of the way.
It "somehow works" isn't engineering.
So very much like an LLM accessing multiple pieces of functionality across different tools and API endpoints (if you want to imagine it that way).
While it is seemingly very knowledgeable, it is rather stupid. It gets duped by nefarious actors or has a class of bugs that are elementary that put the crew into awkward positions.
Most professional software engineers might have previously looked as these scenarios as implausible, given the "failure model" of current software is quite blunt, and especially given how far into the future the series took place.
Now we see that computational tasks are becoming less predictable, less straight-forward, with cascading failures instead of blunt, direct failures. Interacting with an LLM might be compared to talking with a person in psychosis when it starts to hallucinate.
So you get things like this in the Star Trek universe: https://www.youtube.com/watch?v=kUJh7id0lK4
Which make a lot more sense, become a lot more plausible and a lot more relatable with our current implementations of AI/LLM's.
But it sure is fast.
The author even later says that MCP supports JSON Schema, but also claims "you can't generate type-safe clients". Which is plainly untrue, there exist plenty of JSON Schema code generators.
Claude will happily cast your int into a 2023 Toyota Yaris and keep on hallucinating things.
> Cast an integer into the type of a 2023 Toyota Yaris using Javascript
(GPT-4o mini)
> To cast an integer into the type of a 2023 Toyota Yaris in JavaScript, you would typically create a class or a constructor function that represents the Toyota Yaris. Then, you can create an instance of that class using the integer value. Here's an example of how you might do this:
Claude Code validated the response against the schema and did not pass the response to the LLM.
It works in this instance. On this run. It is not guaranteed to work next time. There is a error percentage here that makes it _INEVITABLE_ that eventually, with enough executions, the validation will pass when it should fail.
It will choose not to pass this to the validator, at some point in the future. It will create its own validator, at some point in the future. It will simply pretend like it did any of the above, at some point in the future.
This might be fine for your B2B use case. It is not fine for underlying infrastructure for a financial firm or communications.
Can you guarantee it will validate it every time ? Can you guarantee the way MCPs/tool calling are implemented (which is already an incredible joke that only python brained developers would inflict upon the world) will always go through the validation layer, are you even sure of what part of Claude handles this validation ? Sure, it didn't cast an int into a Toyota Yaris. Will it cast "70Y074" into one ? Maybe a 2022 one. What if there are embedded parsing rules into a string, will it respect it every time ? What if you use it outside of Claude Code, but just ask nicely through the API, can you guarantee this validation still works ? Or that they won't break it next week ?
The whole point of it is, whichever LLM you're using is already too dumb to not trip when lacing its own shoes. Why you'd trust it to reliably and properly parse input badly described by a terrible format is beyond me.
Yes, to the extent you can guarantee the behavior of third party software, you can (which you can't really guarantee no matter what spec the software supposedly implements, so the gaps aren't an MCP issue), because “the app enforces schema compliance before handing the results to the LLM” is deterministic behavior in the traditional app that provides the toolchain that provides the interface between tools (and the user) and the LLM, not non-deterministic behavior driven by the LLM. Hence, “before handing the results to the LLM”.
> The whole point of it is, whichever LLM you're using is already too dumb to not trip when lacing its own shoes. Why you'd trust it to reliably and properly parse input badly described by a terrible format is beyond me.
The toolchain is parsing, validating, and mapping the data into the format preferred by the chosen models promot template, the LLM has nothing to do with doing that, because that by definition has to happen before it can see the data.
You aren't trusting the LLM.
The LLM has everything to do with that. The LLM is literally choosing to do that. I don't know why this point keeps getting missed or side-stepped.
It WILL, at some point in the future and given enough executions, as a matter of statistical certainty, simply not do that above, or pretend to do the above, or do something totally different at some point in the future.
How does the AI bypass the MCP layer to make the request? The assumption is (as I understand it) the AI says “I want to make MCP request XYZ with data ABC” and it sends that off to the MCP interface which does the heavy lifting.
If the MCP interface is doing the schema checks, and tossing errors as appropriate, how is the AI routing around this interface to bypass the schema enforcement?
The MCP interface (Claude Code in this case) is doing the schema checks. Claude Code will refuse to provide the result to the LLM if it does not pass the schema check, and the LLM has no control over that.
No, the LLM doesn't control on a case-by-caae basis what the toolchain does between the LLM putting a tool call request in an output message and the toolchain calling the LLM afterwards.
If the toolchain is programmed to always validate tool responses against the JSON schema provided by MCP server before mapping into the LLM prompt template and calling the LLM again to handle the response, that is going to happen 100% of the time. The LLM doesn't choose it. It CAN'T because the only way it even knows that the data has come back from the tool call is that the toolchain has already done whatever it is programmed to do, ending with mapping the response into a prompt and calling the LLM again.
Even before MCPs or even models specifically trained and with vendor-provided templates for tool calling (but after the ReAct architecture was described), it was like a weekend project to implement a basic framework supporting tooling calling around a local or remote LLM. I don't think you need to do that to understand how silly the claim that the LLM controls what the toolchain does with each response and might make it not validate it is, but certainly doing it will give you a visceral understanding of how silly it is.
The pieces here are:
* Claude Code, a Node (Javascript) application that talks to MCP server(s) and the Claude API
* The MCP server, which exposes some tools through stdin or HTTP
* The Claude API, which is more structured than "text in, text out".
* The Claude LLM behind the API, which generates a response to a given prompt
Claude Code is a Node application. CC is configured in JSON with a list of MCP servers. When CC starts up, CC"s Javascript initialises each server and as part of that gets a list of callable functions.
When CC calls the LLM API with a user's request, it's not just "here is the user's words, do it". There are multiple slots in the request object, one of which is a "tools" block, a list of the tools that can be called. Inside the API, I imagine this is packaged into a prefix context string like "you have access to the following tools: tool(args) ...". The LLM API probably has a bunch of prompts it runs through (figure out what type of request the user has made, maybe using different prompts to make different types of plan, etc.) and somewhere along the way the LLM might respond with a request to call a tool.
The LLM API call then returns the tool call request to CC, in a structured "tool_use" block separate from the freetext "hey good news, you asked a question and got this response". The structured block means "the LLM wants to call this tool."
CC's JS then calls the server with the tool request and gets the response. It validates the response (e.g., JSON schemas) and then calls the LLM API again bundling up the success/failure of the tool call into a structured "tool_result" block. If it validated and was successful, the LLM gets to see the MCP server's response. If it failed to validate, the LLM gets to see that it failed and what the error message was (so the LLM can try again in a different way).
The idea is that if a tool call is supposed to return a CarMakeModel string ("Toyota Tercel") and instead returns an int (42), JSON Schemas can catch this. The client validates the server's response against the schema, and calls the LLM API with
So the LLM isn't choosing to call the validator, it's the deterministic Javascript that is Claude Code that chooses to call the validator.There are plenty of ways for this to go wrong: the client (Claude Code) has to validate; int vs string isn't the same as "is a valid timestamp/CarMakeModel/etc"; if you helpfully put the thing that failed into the error message ("Expect string, got integer (42)") then the LLM gets 42 and might choose to interpret that as a CarMakeModel if it's having a particularly bad day; the LLM might say "well, that didn't work, but let's assume the answer was Toyota Tercel, a common car make and model", ... We're reaching here, yet these are possible.
But the basic flow has validation done in deterministic code and hiding the MCP server's invalid responses from the LLM. The LLM can't choose not to validate. You seemed to be saying that the LLM could choose not to validate, and your interlocutor was saying that was not the case.
I hope this helps!
I can't gaurantee that behavior will remain the same more than any other software. But all this happens before the LLM is even involved.
> The whole point of it is, whichever LLM you're using is already too dumb to not trip when lacing its own shoes. Why you'd trust it to reliably and properly parse input badly described by a terrible format is beyond me.
You are describing why MCP supports JSON Schema. It requires parsing & validating the input using deterministic software, not LLMs.
No. It is not. You are still misunderstanding how this works. It is "choosing" to pass this to a validator or some other tool, _for now_. As a matter of pure statistics, it will simply not do this at some point in the future on some run.
It is inevitable.
Or write a simple MCP server and a client that uses it. FastMCP is easy: https://gofastmcp.com/getting-started/quickstart
You are quite wrong. The LLM "chooses" to use a tool, but the input (provided by the LLM) is validated with JSON Schema by the server, and the output is validated by the client (Claude Code). The output is not provided back to the LLM if it does not comply with the JSON Schema, instead an error is surfaced.
It is absolutely possible to do this, and to generate client code which complies with ISO-8601 in JS/TS. Large amounts of financial services would not work if this was not the case.
See the c# support for ISO-8601 strings: https://learn.microsoft.com/en-us/dotnet/standard/base-types...
`DateTime` is not an ISO-8601 type. It can _parse_ an ISO-8601 formatted string.
And even past that, there are Windows-specific idiosyncrasies with how the `DateTime` class implements the parsing of these strings and how it stores the resulting value.
This is exactly the point: a string is just a data interchange format in the context of a DateTime, and C# provides (as far as I can tell) a complete way of accessing the ISO-8601 specification on the language object. It also supports type-safe generation of clients and client object (or struct) generation from the ISO-8601 string format.
> And even past that, there are Windows-specific idiosyncrasies with how the `DateTime` class implements the parsing of these strings and how it stores the resulting value.
Not really. The windows statements on the article (and I use this on linux for financial services software) are related to automated settings of the preferences for generated strings. All of these may be set within the code itself.
Related but distinct from serialization.
The merchants of complexity are disappointed. It turns out that even machines don't care for 'machine-readable' formats; even the machines prefer human-readable formats.
The only entities on this planet who appreciate so-called 'machine-readability' are bureaucrats; and they like it for the same reason that they like enterprise acronyms... Literally the opposite of readability.
LLMs are basically automating PEBKAC
May have changed, but unlikely. I worked with medical telemetry as a young man and it was impressed upon me thoroughly how important parsing timestamps correctly was. I have a faint memory, possibly false, of this being the first time I wrote unit tests (and without the benefit of a test framework).
We even accounted for lack of NTP by recalculating times off of the timestamps I. Their message headers.
And the reasons I was given were incident review as well as malpractice cases. A drug administered three seconds before a heart attack starts is a very different situation than one administered eight seconds after the patient crashed. We saw recently with the British postal service how lives can be ruined by bad data, and in medical data a minute is a world of difference.
I also work in healthcare, and we've seen HL7v2 messages with impossible timestamps. (E.g., in the spring-forward gap.)
As RPC mechanisms go, HTTP is notable for how few of the classic blunders they made in 1.0 of the spec. Clock skew correction is just my favorite. Technically it exists for cache directives, but it’s invaluable for coordination across machines. There are reasons HTTP 2.0 waited decades to happen. It just mostly worked.
this is like saying "HTTP doesn't do json validation", which, well, yeah.
The stuff about type validation is incorrect. You don't need client-side validation. You shouldn't be using APIs you don't trust as tools and you can always add instructions about the LLM's output format to convert to different formats.
MCP is not the issue. The issue is that people are using the wrong tools or their prompts are bad.
If you don't like the format of an MCP tool and don't want to give formatting instructions the LLMs, you can always create your own MCP service which outputs data in the correct format. You don't need the coercion to happen on the client side.
When desktop OSes came out, hardware resources were scarce so all the desktop OSes (DOS, Windows, MacOS) forgot all the lessons from Unix: multi user, cooperative multitasking, etc. 10 years later PC hardware was faster than workstations from the 90s yet we're still stuck with OSes riddled with limitations that stopped making sense in the 80s.
When smartphones came out there was this gold rush and hardware resources were scarce so OSes (iOS, Android) again forgot all the lessons. 10 years later mobile hardware was faster than desktop hardware from the 00s. We're still stuck with mistakes from the 00s.
AI basically does the same thing. It's all lead by very bright 20 and 30 year olds that weren't even born when Windows was first released.
Our field is doomed under a Cascade of Attention-Deficit Teenagers: https://www.jwz.org/doc/cadt.html (copy paste the link).
It's all gold rushes and nobody does Dutch urban infrastructure design over decades. Which makes sense as this is all driven by the US, where long term plan I is anathema.
Of course this keeps happening
If an LLM can be shown to be useful 80% of the time to the JS mindset this is fine, and the remaining 20% can be resolved once we're being paid for the rest, Pareto principle be damned.
Mostly, no. Whether its the client sending (statically) bad data or the server returning (statically) bad data, schema validation on the other end (assuming somehow it is allowed by the toolchain on the sending end) should reject it before it gets to the custom code of the MCP server or MCP client.
For arguments that are the right type but wrong because of the state of the universe, yes, the server receiving it should send a useful error message back to the client. But that's a different issue.
At some point we have to decide as a community of engineers that we have to stop building tools that are little more than loaded shotguns pointed at our own feet.
GIEMGO garbage in even more garbage out
Ironically, it's achieved this - but that's an indictment of USB-C, not an accomplishment of MCP. Just like USB-C, MCP is a nigh-universal connector with very poorly enforced standards for what actually goes across it. MCP's inconsistent JSON parsing and lack of protocol standardization is closely analogous to USB-C's proliferation of cable types (https://en.wikipedia.org/wiki/USB-C#Cable_types); the superficial interoperability is a very leaky abstraction over a much more complicated reality, which IMO is worse than just having explicitly different APIs/protocols.
Previously, you could reasonably expect a USB-C on a desktop/laptop of an Apple Silicon device, to be USB4 40Gbps Thunderbolt, capable of anything and everything you may want to use it for.
Now, some of them are USB3 10Gbps. Which ones? Gotta look at the specs or tiny icons, I guess?
Apple could have chosen to have the self-documenting USB-A ports to signify the 10Gbps limitation of some of these ports (conveniently, USB-A is limited to exactly 10Gbps, making it perfect for the use-case of having a few extra "low-speed" ports at very little manufacturing cost), but instead, they've decided to further dilute the USB-C brand. Pure innovation!
With the end user likely still having to use a USB-C to USB-A adapters anyways, because the majority of thumb drives, keyboards and mice, still require a USB-A port — even the USB-C ones that use USB-C on the kb/mice itself. (But, of course, that's all irrelevant because you can always spend 2x+ as much for a USB-C version of any of these devices, and the fact that the USB-C variants are less common or inferior to USB-A, is of course irrelevant when hype and fanaticism are more important than utility and usability.)
Unfortunately, no one understood SOAP back.
(Additional context: Maintaining a legacy SOAP system. I have nothing good to say about SOAP and it should serve as a role model for no one)
It doesn't take very long for people to start romanticizing things as soon as they're not in vogue. Even when the painfulness is still fresh in memory, people lament over how stupid new stuff is. Well I'm not a fan of schemaless JSON APIs (I'm one of those weird people that likes protobufs and capnp much more) but I will take 50 years of schemaless JSON API work over a month of dealing with SOAP again.
/“xml is like violence, if it’s not working just use more!”
No.
SOAP uses that, but SOAP involves a whole lot of spec about how you do that, and that's even before (as the article seems to) treat SOAP as meaning SOAP + the set of WS-* standards built around it.
Unfortunately as usual when a new technology cycle comes, everything gets thrown away, including the good parts.
And I actually like XML-based technologies. XML Schema is still unparalleled in its ability to compose and verify the format of multiple document types. But man, SOAP was such a beast for no real reason.
Instead of a simple spec for remote calls, it turned into a spec that described everything and nothing at the same time. SOAP supported all kinds of transport protocols (SOAP over email? Sure!), RPC with remote handles (like CORBA), regular RPC, self-describing RPC (UDDI!), etc. And nothing worked out of the box, because the nitty-gritty details of authentication, caching, HTTP response code interoperability and other "boring" stuff were just left as an exercise to the reader.
Part of this is the nature of XML. There's a million ways to do things. Should some data be parsed as an attribute of the tag or should it be another tag? Perhaps the data should be in the body between the tags? HTML, based on XML, has this problem; eg. you can seriously specify <font face="Arial">text</font> rather than have the font as a property of the wrapping tag. There's a million ways to specify everything and anything and that's why it makes a terrible data parsing format. The reader and writer must have the exact same schema in mind and there's no way to have a default when there's simply no particular correct way to do things in XML. So everything had to be very very precisely specified to the point it added huge amounts of work when a non-XML format with decent defaults would not have that issue.
This become a huge problem for SOAP and why i hate it. Every implementation had different default ways of handling even the simplest data structure passing between them and were never compatible unless you took weeks of time to specify the schema down to a fine grained level.
In general XML is problematic due to the lack of clear canonical ways of doing pretty much anything. You might say "but i can specify it with a schema" and to that i say "My problem with XML is that you need a schema for even the simplest use case in the first place".
But parts of XML infrastructure were awesome. I could define a schema for the data types, and have my IDE auto-complete and validate the XML documents as I typed them. I could also validate the input/output data and provide meaningful errors.
And yeah, I also worked with XML and got burned many times by small incompatibilities that always happen due to its inherent complexity. If XML were just a _bit_ simpler, it could have worked so much better.
Generally it worked very well when both ends were written in the same programming language and was horseshit if they weren’t. No wonder Microsoft liked SOAP so much.
IBM thought they were good at lockin, until Bill Gates came along.
I've been on the other side of high-feature serialization protocols, and even at large tech companies, something like migrating to gRPC is a multi-year slog that can even fail a couple of times because it asks so much of you.
MCP, at its core, is a standardization of a JSON API contract, so you don't have to do as much post-training to generate various tool calling style tokens for your LLM.
I think you meant that is why JSON won instead of XML?
Not just XML, but a lot of other serialization formats and standards, like SOAP, protobuf in many cases, yaml, REST, etc.
People say REST won, but tell me how many places actually implement REST or just use it as a stand-in term for casual JSON blobs to HTTP URLs?
Now, YAML has quite a few shortcomings compared to JSON (if you don't believe me, look at its handling of the string no, discussed on HN), so, at least to me, it's obvious why JSON won.
SOAP, don't get me started on that, it's worth less than XML, protobuf is more efficient but less portable, etc.
That's backwards reasoning. XML was too complicated, so they decided on a simpler JSON.
And its complexity and size now are rivaling the specs of the good old XML-infused times.
Didn’t get that job, one of the interviewers asked me to write concurrent code, didn’t like my answer, but his had a race condition in it and I was unsuccessful in convincing him he was wrong. He was relying on preemption not occurring on a certain instruction (or multiprocessing not happening). During my tenure at the job I did take the real flaws in the Java Memory Model would come out and his answer became very wrong and mine only slightly.
So it baked in core assumptions that the network is transparent, reliable, and symmetric. So you could create an object on one machine, pass a reference to it to another machine, and everything is supposed to just work.
Which is not what happens in the real world, with timeouts, retries, congested networks, and crashing computers.
Oh, and CORBA C++ bindings had been designed before the STL was standardized. So they are a crawling horror, other languages were better.
On a more general note, I see in many critical comments here what I perceive to be a category error. Using JSON to pass data between web client and server, even in more complex web apps, is not the same thing as supporting two-way communications between autonomous software entities that are tasked to do something, perhaps something critical. There could be millions of these exchanges in some arbitrarily short time period, thus any possibility of errors is multiplied accordingly, and the effect any error could cascade if it does not fail early. I really don't believe this is a case where "worse is better." To use an analogy, yes everyday English is a versatile language that works great for most use cases; but when you really need to nail things down, with no tolerance for ambiguity, you get legalese or some other jargon. Or CORBA, or gRPC, etc.
- SOAP - interop needs support of DOC or RPC based between systems, or a combination, XML and schemas are also horribly verbose.
- CORBA - libraries and framework were complex, modern languages at the time avoided them in deference to simpler standards (e.g. Java's Jini)
- GPRC - designed for speed, not readability, requires mappings.
It's telling that these days REST and JSON (via req/resp, webhooks, or even streaming) are the modern backbone of RPC. The above standards either are shoved aside or for GPRC only used where extreme throughput is needed.
Since REST and JSON are the plat du jour, MCP probably aligns with that design paradigm rather than the dated legacy protocols.
The greater problem is industry misunderstanding and misalignment with what agents are and where they are headed.
Web platforms of the world believe agents will be embedded in networked distributed infrastructure. So we should ship an MCP platform in our service mesh for all of the agents running in containers to connect to.
I think this is wrong, and continues to be butchered as the web pushes a hard narrative that we need to enable web-native agents & their sdks/frameworks that deploy agents as conventional server applications. These are not agents nor the early evolutionary form of them.
Frontier labs will be the only providers of the actual agentic harnesses. And we are rapidly moving to computer use agents - MCP servers were intended to serve as single instance deployments for single harnesses. ie. a single mcp server on my desktop for my Claude Desktop.
> In financial services, this means a trading AI could misinterpret numerical types and execute trades with the wrong decimal precision.
If you are letting an LLM execute trades with no guardrails then it is a ticking time bomb no matter what protocol you use for the tool calls.
> When an AI tool expects an ISO-8601 timestamp but receives a Unix epoch, the model might hallucinate dates rather than failing cleanly.
If your process breaks because of a hallucinated date -- don't use an LLM for it.
You'd still need basically the entire existing MCP spec to cover the use cases if it replaced JSON-RPC with Swagger or protobuf, plus additional material to cover the gaps and complications that that switch would involve.
I agree that swagger leaves a lot unplanned. I disagree about the local use case because (1) we could just run local HTTP servers easily and (2) I frankly assume the future of MCP is mostly remote.
Returning back to JSON-RPC, it’s a poorly executed RPC protocol. Here is an excellent HackerNews thread on it, but the TLDR is parsing JSON is expensive and complex, we have tons of tools (eg load balancers) that make modern services, and making those tools parse json is very expensive. Many people in the below thread mention alternative ways to implement J-RPC but that depends on new clients.
https://news.ycombinator.com/item?id=34211796
I know this because I wish it did. You can approximate streaming responses by using progress notifications. If you want something like the LLM partial response streaming, you'll have to extend MCP with custom capabilities flags. It's totally possible to extend it in this way, but then it's non standard.
Perhaps you are alluding to the fact that it's bidirectional protocol (by spec at least).
It seems to be a game of catch up for most things AI. That said, my school of thought is that certain technologies are just too big for them to be figured out early on - web frameworks, blockchain, ...
- the gap starts to shrink eventually. With AI, we'll just have to keep sharing ideas and caution like you have here. Such very interesting times we live in.
Multics vs Unix, xml based soap vs json based rest apis, xhtml’s failure, javascript itself, … I could keep going on.
So I’ve resigned myself to admitting that we are doomed to reimplement the “good enough” every time, and continue to apply bandaid after bandaid to gradually fix problems after we rediscover them, slowly.
https://en.m.wikipedia.org/wiki/Worse_is_better
It's been confirmed over and over since then. And I say that as someone who naturally gravitates towards "better" solutions.
The world we could have lived in... working web forms validations, working microdata...
Sure, they might still find themselves in highly regulated industries where risk avoidance trumps innovation everyday, all day.
MCP is for _the web_ , it started with stdio only because Anthropic was learning lessons from building Claude Code.
Author also seems to expect that the result from MCP tool usage will feed directly to an LLM. This is preposterous and a recipe for disaster. Obviously you’d validade structured response against a schema, check for harmful content, etc etc.
> Author also seems to expect that the result from MCP tool usage will feed directly to an LLM
Isn't this exactly what MCP is for? Most tools I've come across are to feed context from other sources directly to the LLM. I believe this is the most common use-case for the protocol.
MCP is not a protocol. It doesn't protocolize anything of use. It's just "here's some symbols, do with them whatever you want.", leaving it there but then advertising that as a feature of its universality. It provides almost just as much of a protocol as TCP, but rebuild on 5 OSI layers, again.
It's not a security issue, it's a ontological issue.
That being said. MCP as a protocol has a fairly simple niche. Provide context that can be fed to a model to perform some task. MCP covers the discovery process around presenting those tools and resources to an Agent in a standardized manner. An it includes several other aspects that are useful in this niche. Things like "sampling" and "elicitations". Is it perfect? Not at all. But it's a step in the right direction.
The crowd saying "just point it at an OpenAPI service" does not seem to fully understand the current problem space. Can many LLMs extract meaning from un-curated API response messages? Sure. But they are also burning up context holding junk that isn't needed. Part of MCP is the acknowledgement that general API responses aren't the right way to feed the model the context it needs. MCP is supposed to be taking a concrete task, performing all the activities need to gather the info or affect the change, then generate clean context meant for the LLM. If you design an OpenAPI service around those same goals, then it could easily be added to an Agent. You'd still need to figure out.all the other aspects, but you'd be close. But at that point you aren't pointing an Agent at a random API, you're pointing it at a purpose made API. And then you have to wonder, why not something like MCP that's designed for that purpose from the start?
I'll close by saying there are an enormous number of MCP Servers out there that are poorly written, thin wrappers on general APIs, or have some other bad aspects. I attribute a lot of this to the rise in AI Coding Agents allowing people with poor comprehension of the space enabling them to crank out this... Noise.
There are also great examples of MCP Servers to be found. They are the ones that have thoughtful designs, leverage the spec fully, and provide nice clean context for the Agent to feed to the LLM.
I can envision a future where we can simply point an agent at a series of OpenAPI services and the agent uses it's models to self-assemble what we consider the MCP server today. Basically it would curate accessing the APIs into a set of focused tools and the code needed to generate the final context. That's not quite where we are today. It's likely not far off though.
Also Erlang uses RPCs for pretty much all "synchronous" interactions but it's pretty minimal in terms of ceremony. Seems pretty reliable.
So this is a serious question because hand rolling "40 years" of best practices seems hard, what should we be using for RPC?
This is really obvious when they talk about tracing and monitoring, which seem to be the main points of criticism anyway.
They bemoan that they cant trace across MCP calls, assuming somehow there would be a person administering all the MCPs. Of course each system has tracing in whatever fashion fits its system. They are just not the same system, nor owned by the same people let alone companies.
Same as monitoring cost. Oh, you can’t know who racked up the LLM costs? Well of course you can, these systems are already in place and there are a million of ways to do this. It has nothing to do with MCP.
Reading this, I think its rather a blessing to start fresh and without the learnings of 40 years of failed protocols or whatever
1. Lessons.
2. Fairly sure all of Google is built on top of protobuf.
Point-by-point for the article's gripes:
- distributed tracing/telemetry - open discussion at https://github.com/modelcontextprotocol/modelcontextprotocol...
- structured tool annotation for parallelizability/side-effects/idempotence - this actually already exists at https://modelcontextprotocol.io/specification/2025-06-18/sch... but it's not well documented in https://modelcontextprotocol.io/specification/2025-06-18/ser... - someone should contribute to improving this!
- a standardized way in which the costs associated with an MCP tool call can be communicated to the MCP Client and reported to central tracking - nothing here I see, but it's a really good idea!
- serialization issues e.g. "the server might report a date in a format unexpected by the client" - this isn't wrong, but since the consumer of most tool responses is itself an LLM, there's a fair amount of mitigation here. And in theory an MCP Client can use an LLM to detect under-specified/ambiguous tool specifications, and could surface these issues to the integrator.
Now, I can't speak to the speed at which Maintainers and Core Maintainers are keeping up with the community's momentum - but I think it's meaningful that the community has momentum for evolving the specification!
I see this post in a highly positive light: MCP shows promise because you can iterate on these kinds of structured annotations, in the context of a community that is actively developing their MCP servers. Legacy protocols aren't engaging with these problems in the same way.
Actually, MCP uses a normative TypeScript schema (and, from that, an autogenerated JSON Schema) for the protocol itself, and the individual tool calls also are specified with JSON Schema.
> Type validation happens at runtime, if at all.
That's not a consequence of MCP "opting for schemaless JSON" (which it factually does not), that's, for tool calls, a consequence of MCP being a discovery protocol where the tools, and thus the applicable schemas, are discovered aruntime.
If you are using MCP as a way to wire up highly-static components, you can do discovery against the servers once they are wired up, statically build the clients around the defined types, and build your toolchain to raise errors if the discovery responses change in the future. But that's not really the world MCP is built for. Yes, that means that the toolchain needs, if it is concerned about schema enforcement, use and apply the relevant schemas at runtime. So, um, do that?
- Electron disregards 40 years of best deployment practices,
- Web disregards 40 years of best GUI practices,
- Fast CPUs and lots of RAM disregards 40 years of best software optimization techniques,
there are probably many more examples.
windows 10 is easier to use than windows 95.
osx is easier to use than mac.. whatever they named their old versions.
It goes on and on. I can have 50 browser tabs open at the same time, each one hosting a highly complicated app, ranging from media playback to chat rooms to custom statistical calculators. I don't need to install anything for any of these apps, I just type in a short string in my url bar. And they all just work, at the same time.
Things are in fact better now.
The ISO8601 v Unix epoch example seems very weak to me. I'd certainly expect any model to be capable of distinguishing between these things, so, it doesn't seem like a big deal that either one would be allowed in a JSON.
Honestly, my view that nothing of value ever gets published on medium, is strongly reinforced here.
But why did the designers make that choice when they had any of half a dozen other RCP protocols to choose from?
> The ISO8601 v Unix epoch example seems very weak to me. I'd certainly expect any model to be capable of distinguishing between these things
What about the medical records issue? How is the model to distinguish a weight in kgs from one in pounds?
Wouldn't medical records actually be better in JSON, because the field could expressly have "kg" or "lb" suffix within the value of the field itself, or even in the name of the field, like "weight-in-kg" or "weight-in-lb"? This is actually the beauty of JSON compared other formats where these things may end up being just a unitless integer.
The biggest problem with medical data would probably remain the human factor, where regardless of the format used by the machines and by MCP, the underlying data may already be incorrect or not coded properly, so, if anything, AI would likely have a better chance of interpreting the data correctly than the API provider blindly mislabelling unitless data.
On that note; some of these “best practices” arguably haven’t worked out. “Be conservative with what you send, liberal with what you receive” has turned even decent protocols into a dumpster fire, so why keep the charade going?
Failed protocols such as TCP adopted Postel's law as a guiding principle, and we all know how that worked out!
WSDL is just pure nonsense. The idea that software would need to decide which API endpoints it needs on its own, is just profoundly misguided... Literally nobody and nothing ever reads the WSDL definitions; it's just poor man's documentation, at best.
LLMs only reinforce the idea that WSDL is a dumb idea because it turns out that even the machines don't care for your 'machine-friendly' format and actually prefer human-friendly formats.
Once you have an MPC tool working with a specific JSON API, it will keep working unless the server makes breaking changes to the API while in production which is terrible practice. But anyway, if you use a server, it means you trust the server. Client-side validation is dumb; like people who need to put tape over their mouths because they don't trust themselves to follow through on their diet plans.
WSDLs being available from the servers allows (a) clients to validate the requests they make before sending them to the server, and (b) developers (or in principle even AI) with access to the server to create a client without needing further out-of-band specifications.
I don't buy this idea that code should be generated automatically without a human involved (at least as a reviewer).
I also don't buy the idea that clients should validate their requests before sending to the server. The client's code should trust itself. I object to any idea of code (or any entity) not trusting itself. That is a flawed trust model.