Show HN: Robot MCP Server – Connect Any Language Model and ROS Robots Using MCP

20 r-johnv 21 9/10/2025, 12:58:29 PM github.com ↗
We’ve open-sourced the Robot MCP Server, a tool that lets large language models (LLMs) talk directly to robots running ROS1 or ROS2.

What it does - Connects any LLM to existing ROS robots via the Model Context Protocol (MCP) - Natural language → ROS topics, services, and actions (And the ability to read any of them back) - Works without changing robot source code

Why it matters - Makes robots accessible from natural language interfaces - Opens the door to rapid prototyping of AI-robot applications - We are trying to create a common interface for safe AI ↔ robot communication

This is too big to develop alone — we’d love feedback, contributors, and partners from both the robotics and AI communities.

Comments (21)

Shrikrishna_Gad · 7h ago
Really impressive work! I love that you can set this up without touching any existing robot code—just start rosbridge and the MCP server, and the LLM can both control and observe the ROS system. It’s like having a conversational ros2cli. The KUKA arm demo is particularly striking—the LLM can call gripper services in real time, all from natural language. One thing I’m curious about—could this setup handle coordinating multiple robots simultaneously, or is it still limited to a single robot per session?”
r-johnv · 6h ago
Thank you!

The underlying stack definitely can connect to multiple robots simultaneously. Our current implementation is sequential, where the language model can connect from one robot to the next (and then back).

But it is definitely possible for us to write it to be simultaneous as well.

awanacode · 6h ago
Really cool, the multi-robot angle got me thinking. For simultaneous setups, would that mean spinning separate MCP clients (one per robot), or could a single client handle multiple connections in parallel, almost like an operator shell aware of multiple robot contexts?

On the flip side, how would you handle conflicting commands from multiple clients? Is it last-writer-wins, or do you envision some arbitration layer? It feels like orchestration + conflict resolution will be key if MCP is to scale beyond single-robot demos into fleet-level use.

r-johnv · 4h ago
Conflicting commands from multiple clients is one of the open questions at the moment.

We're also building out a technical steering committee to help guide our direction on topics like this. Safety is a big category where having direction from across the community will be important.

r-johnv · 4h ago
It should be possible to do it from one client. The MCP server would handle the parallel connections.

We're using websockets as the interface between the server and the robot itself, which to the best of my exploration does have the ability for simultaneous connections.

sayid_islam · 2h ago
Fascinating work!

Does this also entail industrial (sector-agnostic) applications where mitigating actions, based on vision or other sensor data based leading indicators, can proactively be taken using LLM-directed mitigation protocols? Does it allow for non-technical users to perhaps drive debugging or other similar mitigation actions?

k-warburton · 2h ago
This is a very intriguing application of physical AI. It is astounding to see demos of how simple, human instructions can produce complex machine actions. What can we do to improve the safety of protecting assets or people in the case of misuse? I see some code contributions regarding permission controls, but are there other steps we can take to ensure this technology "understands" when the physical motion being requested is not appropriate because it might endanger people or expensive hardware that is difficult to replace?
avasan · 8h ago
LLMs sometimes hallucinate while stating one thing and outputting disconnected commands/code on the back end in many superficial/general models (Claude included) - does the MCP try to correct/account for that? Or is user oversight necessary to ensure that actions match the LLM output? Is the only way to stop aberrant operation by hitting a physical emergency stop or can the LLM interface be used to for that?
sdallagasperina · 6h ago
Great question! I’m one of the collaborators on the project. Right now, the MCP server doesn’t “correct” hallucinations itself, but it enforces a strict tool interface: the LLM can only call valid ROS topics, services, or actions that actually exist and that are explicitly exposed as safe to use. This information is provided through the MCP, so if the model hallucinates a command, the call simply fails gracefully rather than executing something unintended.

For more advanced use cases, we’re also thinking about adding validation layers and safety constraints before execution — so the MCP acts not just as a bridge, but also as a safeguard.

r-johnv · 5h ago
Adding direct links to two of the videos of the MCP server in action.

This is the video of interacting with and debugging an industrial robot. (A few of the other comments here have been talking about this, that we see some amount of what looks like emergent behavior) https://www.youtube.com/watch?v=SrHzC5InJDA

This is a video from a collaborator research lab controlling a Unitree Go (robot dog) https://youtu.be/RW9_FgfxWzs?si=o7tIHs5eChEy9glI

awanacode · 6h ago
Hey guys, nice work! Finally someone is taking the bull by the horns.

What excites me most is the potential for MCP to help with diagnostics and deployment for non-developers. A lot of lab techs or operators don’t want to dive into ros2 topic hz or parse logs — they just want to ask simple questions like “why isn’t the arm responding?” or “is this topic publishing?”.

A natural language layer over ROS could make debugging and deployment way easier for non-technical users — almost like having a conversational ros2 doctor or ros2 launch.

querist9 · 5h ago
great. thanks for the nice comment.
lpigeon · 5h ago
I believe this project will play a significant role in helping to control robots using natural language!
avasan · 8h ago
This looks like very exciting work! I'm curious - how much pre-context did you need to provide Claude to operate the industrial robot? That looks like a very complex environment.
r-johnv · 8h ago
Thank you! This has been an exciting project to work on!

For the industrial robot (in the video on the main readme page) I intentionally gave Claude no context beforehand. All of the inferences that you see there are based on information that it got through the MCP tools that let it analyze the ROS topics and services on the robot.

In fact, I had a starting prompt to ignore all context from previous conversations because this looked to be like an example of emergent behavior and I wanted to confirm that it was not picking things from my earlier conversations!

avasan · 5h ago
If you don't start from a clean-slate, does the behavior change drastically? Does the interaction between the LLM layer and the MCP/ROS become more efficient if it already has some context? Would that be something you'd want to toggle or do previous conversation contexts cause issues with new command implementations?
r-johnv · 5h ago
I'm actually reviewing a PR right now that enables the MCP server to have a library of common robots with a specification file for each robot containing important context to the language model.

The demo video with the Unitree Go (robot dog) uses this approach to give the LLM additional context of the custom poses available to it.

r-johnv · 5h ago
Not starting from clean slate improves the interaction significantly. I started from a clean slate on the industrial robot video in order to highlight how much is possible even when starting from one.
avasan · 5h ago
That makes this all the more impressive!! What happens when you get an incorrect interpretation though? That is now in the "previous context bucket". Assuming the user addresses the issue through the LLM layer by talking to through it, do you think that the subsequent interactions could compound the error?

I sometimes face issues with LLMs running out of tokens or only using partial contacts from previous conversations - thereby repeating or compounding on previous incorrect responses.

Any thoughts on how to tackle that? Or is that too abstract a problem/beyond the scope to address at the moment?

r-johnv · 5h ago
(Mentioning beforehand that we're still very early when it comes to the exact behavior of each language model)

So far, I've observed that for Claude and Gemini, which are what we've been testing most with, the Language model has been pretty good at recognizing its a faulty initial interpretation when it queries more information from the system.

Running out of tokens is a more significant issue. We saw it a lot when we queried imaged topics, which led us to try writing better image interpreters within the MCP server itself (credit to my collaborators at the Hanyang university in Korea) to defend the context window. Free tiers of the language models also run out of tokens quite quickly.

PS - Thank you for the questions, I'm enjoying talking about this here on HN with people who look at it critically and challenge us!

suninderminion · 5h ago
Great Work!