Show HN: I built an AI agent that turns ROS 2's turtlesim into a digital artist

29 ponta17 9 5/31/2025, 10:17:17 AM github.com ↗
I'm a grad student studying robotics, with a particular interest in the intersection of LLMs and mobile robots. Recently, I discovered how easily LangChain enables the creation of AI agents, and I wanted to explore how such agents could interact with simulated environments.

So, I built TurtleSim Agent, an AI agent that turns the classic ROS 2 turtlesim turtle into a creative artist.

With this agent, you can give plain English commands like “draw a triangle” or “make a red star,” and it will reason through the instructions and control the simulated turtle accordingly. I’ve included demo videos on GitHub. Behind the scenes, it uses an LLM to interpret the text, decide what actions are needed, and then call a set of modular tools (motion, pen control, math, etc.) to complete the task.

If you're interested in LLM+robotics, ROS, or just want to see a turtle become a digital artist, I'd love for you to check it out:

GitHub: https://github.com/Yutarop/turtlesim_agent

Looking ahead, I’m also exploring frameworks like LangGraph and MCP (Modular Chain of Thought Planning) to see whether they might be better suited for more complex planning and decision-making tasks in robotics. If anyone here is familiar with these frameworks or working in this space, I’d love to connect or hear your thoughts.

Comments (9)

dpflan · 23h ago
Forgive me for asking, but im always curios about the definition of “agent”. What is an “agent” exactly? Is it a static prompt that is sent along with user input to an LLM service and then handles that resposne? And then it’s done? Is an agent a prompted LLM call? Or some entity that is changing its own prompt as it continues to exist?
karmakaze · 22h ago
It depends on how you look at it. If the output 'it' is a drawing, then the agent is the thing doing the drawing on the user's behalf. In more detail the output thing are commands, so then the agent would be what's generating those commands from the user's input. E.g. a web browser is a user agent that makes requests and renders resources that the user specifies.
ponta17 · 22h ago
Thanks for the thoughtful question! The term “agent” definitely gets used in a lot of different ways, so I’ll clarify what I mean here.

In this project, an agent is an LLM-powered system that takes a high-level user instruction, reasons about what steps are needed to fulfill it, and then executes those steps using a set of tools. So it’s more than a single prompted LLM call — the agent maintains a kind of working state and can call external functions iteratively as it plans and acts.

Concretely, in turtlesim_agent, the agent receives an input like “draw a red triangle,” and then: 1. Uses the LLM to interpret the intent, 2. Decides which tools to use (like move forward, turn, set pen color), 3. Calls those tools step-by-step until the task is done.

Hope that clears it up a bit!

paxys · 17h ago
To put it more simply, "agent" is now just a generic term to describe any middleware that sits between user input and a base LLM.
latchkey · 22h ago
This really brings back memories. The first computer language I learned as a child was Logo. My grandfather gifted me a lesson from a local computer store where someone came out to his house and sat with me in front of his Apple II.

I was too young to understand the concepts around the math of steps or degrees. While the thought of programming on a computer was amazing (and later became an engineer), I couldn't grasp Logo, got frustrated, and lost interest.

If I could have had something like this, I'm sure it would have made more sense to me earlier on. It makes me think about how this will affect the learning rate in a positive way.

pj_mukh · 21h ago
Haha this is so incredibly cool.

One thing I might’ve missed, what are the “physics” universe? In the rainbow example the turtle seems to teleport between arcs?

ponta17 · 13h ago
Thanks! Great question.

TurtleSim itself doesn't simulate real-world physics — it allows instant position updates when needed. In this project, the goal was to create a digital turtle artist, not to replicate physical realism. So when the agent wants to draw something, it puts the pen down and moves physically (i.e., using velocity commands). But when it doesn't need to draw and just wants to move quickly to another position, it uses a teleport function I provided as a tool.

That's why in the rainbow example, you might see the turtle "jump" between arcs — it's skipping the movement to get to the next drawing point faster.

moffkalast · 18h ago
That's pretty cool, but I feel like all of the LLM integrations with ROS so far have sort of entirely missed the point in terms of useful applications. Endless examples of models sending bare bone twist commands do a disservice to what LLMs are good at, it's like swatting flies with a bazooka in terms of compute used, too.

Getting the robot to move from point A to point B is largely a solved problem with traditional probabilistic methods, while niches where LLMs are the best fit I think are largely still unaddressed, e.g.:

- a pipeline for natural language commands to high level commands ("fetch me a beer" to [send nav2 goal to kitchen, get fridge detection from yolo, open fridge with moveit, detect beer with yolo, etc.]

- using a VLM to add semantic information to map areas, e.g. have the robot turn around 4 times in a room, and have the model determine what's there so it can reference it by location and even know where that kitchen and fridge is in the above example

- system monitoring, where an LLM looks at ros2 doctor, htop, topic hz, etc. and determines if something's crashed or isn't behaving properly, and returns a debug report or attempts to fix it with terminal commands

- handling recovery behaviours in general, since a lot of times when robots get stuck the resolution is simple, you just need something to take in the current situational information, reason about it, and pick one of the possible ways to resolve it

ponta17 · 12h ago
Thanks a lot for the thoughtful feedback — I really appreciate it!

I think there might be a small misunderstanding regarding how the LLM is actually being used here (and in many agent-based setups). The LLM itself isn’t directly executing twist commands or handling motion; it’s acting as a decision-maker that chooses from a set of callable tools (Python functions) based on the task description and intermediate results.

In this case, yes — one of the tools happens to publish Twist commands, but that’s just one of many modular tools the LLM can invoke. Whether it’s controlling motion or running object detection, from the LLM’s point of view it’s simply choosing which function to call next. So the computational load really depends on what the tool does internally — not the LLM’s reasoning process itself.

Of course, I agree with your broader point: we should push toward more meaningful high-level tasks where LLMs can orchestrate complex pipelines — and I think your examples (like fetch-a-beer or map annotation via VLMs) are spot-on.

My goal with this project was to explore that decision-making loop in a minimal, creative setting — kind of like a sandbox for LLM-agent behavior.

Actually, I’m currently working on something along those lines using a TurtleBot3. I’m planning to provide the agent with tools that let it scan obstacles via 3D LiDAR and recognize objects through image processing, so that it can make more context-aware decisions.

Really appreciate the push for deeper use cases — that’s definitely where I want to go next!