AI Meets WinDBG

154 thunderbong 26 5/5/2025, 5:11:51 AM svnscha.de ↗

Comments (26)

danielovichdk · 4h ago
Claiming to use WinDBG for debugging a crash dump and the only commands I can find in the MCP code are these ? I am not trying to be a dick here, but how does this really work under the covers ? Is the MCP learning windbg ? Is there a model that knows windbg ? I am asking becuase I have no idea.

        results["info"] = session.send_command(".lastevent")
        results["exception"] = session.send_command("!analyze -v")
        results["modules"] = session.send_command("lm")
        results["threads"] = session.send_command("~")
You cannot debug a crash dump only with these 4 commands, all the time.
psanchez · 4h ago
It looks like it is using "Microsoft Console Debugger (CDB)" as the interface to windbg.

Just had a quick look at the code: https://github.com/svnscha/mcp-windbg/blob/main/src/mcp_serv...

I might be wrong, but at first glance I don't think it is only using those 4 commands. It might be using them internally to get context to pass to the AI agent, but it looks like it exposes:

    - open_windbg_dump
    - run_windbg_cmd
    - close_windbg_dump
    - list_windbg_dumps
The most interesting one is "run_windbg_cmd" because it might allow the MCP server to send whatever the AI agent wants. E.g:

    elif name == "run_windbg_cmd":
        args = RunWindbgCmdParams(**arguments)
        session = get_or_create_session(
            args.dump_path, cdb_path, symbols_path, timeout, verbose
        )
        output = session.send_command(args.command)
        return [TextContent(
            type="text",
            text=f"Command: {args.command}\n\nOutput:\n```\n" + "\n".join(output) + "\n```"
        )]

(edit: formatting)
gustavoaca1997 · 4h ago
I think the magic happens in the function "run_windbg_cmd". AFAIK, the agent will use that function to pass any WinDBG command that the model thinks will be useful. The implementation basically includes the interface between the model and actually calling CDB through CDBSession.
eknkc · 2h ago
Yeah that seems correct. It's like creating an SQLite MCP server with single tool "run_sql". Which is just fine I guess as long as the LLM knows how to write SQL (or WinDBG commands). And they definitely do know that. I'd even say this is better because this shifts the capability to LLM instead of the MCP.
anougaret · 3h ago
this is pretty cool but ultimately it won't be enough to debug real bugs that are nested deep within business logic or happening because of long chains of events across multiple services/layers of the stack

imo what AI needs to debug is either:

- train with RL to use breakpoints + debugger or to do print debugging, but that'll suck because chains of action are super freaking long and also we know how it goes with AI memory currently, it's not great

- a sort of omniscient debugger always on that can inform the AI of all that the program/services did (sentry-like observability but on steroids). And then the AI would just search within that and find the root cause

none of the two approaches are going to be easy to make happen but imo if we all spend 10+ hours every week debugging that's worth the shot

that's why currently I'm working on approach 2. I made a time travel debugger/observability engine for JS/Python and I'm currently working on plugging it into AI context the most efficiently possible so it debugs even super long sequences of actions in dev & prod hopefully one day

it's super WIP and not self-hostable yet but if you want to check it out: https://ariana.dev/

ehnto · 3h ago
I think you hit the nail on the head, especially for deeply embedded enterprise software. The long action chains/time taken to set up debugging scenarios is what makes debugging time consuming. Solving the inference side of things would be great, but I feel it takes too much knowledge not in the codebase OR the LLM to actually make an LLM useful once you are set up with a debugging state.

Like you said, running over a stream of events, states and data for that debugging scenario is probably way more helpful. It would also be great to prime the context with business rules and history for the company. Otherwise LLMs will make the same mistake devs make, not knowing the "why" something is and thinking the "what" is most important.

rafaelmn · 3h ago
Frankly this kind of stuff getting upvoted kind of makes HN less and less valuable as a news source - this is yet another "hey I trivially exposed something to the LLM and I got some funny results on a toy example".

These kind of demos were cool 2 years ago - then we got function calling in the API, it became super easy to build this stuff - and the reality hit that LLMs were kind of shit and unreliable at using even the most basic tools. Like oh woow you can get a toy example working on it and suddenly it's a "natural language interface to WinDBG".

I am excited about progress in this front in any domain - but FFS show actual progress or something interesting. Show me an article like this [1] where the LLM did anything useful. Or just show what you did that's not "oh I built a wrapper on a CLI" - did you fine tune the model to get better performance ? Did you compare which model performs better by setting up some benchmark and found one to be impressive ?

I am not shitting on OP here because it's fine to share what you're doing and get excited about it - maybe this is step one, but why the f** is this a front page article ?

[1]https://cookieplmonster.github.io/2025/04/23/gta-san-andreas...

anougaret · 3h ago
yeah it is still truly hard and rewarding to do deep, innovative software but everyone is regressing to the mean, rushing to low hanging fruits, and just plugging old A with new B in the hopes it makes them VC money or something

real, quality AI breakthrough in software creation & maintenance will require deep rework of many layers in the software stack, low and high level.

kevingadd · 1h ago
fwiw, WinDBG actually has support for time-travel debugging. I've used it before quite successfully, it's neat.
anougaret · 40m ago
usual limits of debuggers = barely usable to debug real scenarios
lowleveldesign · 4h ago
I do a lot of Windows troubleshooting and still thinking about incorporating AI in my work. The posted project looks interesting and it's impressive how fast it was created. Since it's using MCP it should be possible to bind it with local models. I wonder how performant and effective it would be. When working in the debugger, you should be careful with what you send to the external servers (for example, Copilot). Process memory may contain unencrypted passwords, usernames, domain configuration, IP addresses, etc. Also, I don’t think that vibe-debugging will work without knowing what eax registry is or how to navigate stack/heap. It will solve some obvious problems, such as most exceptions, but for anything more demanding (bugs in application logic, race conditions, etc.), you will still need to get your hands dirty.

I am actually more interested in improving the debugger interface. For example, AI assistant could help me create breakpoint commands that nicely print function parameters when you only partly know the function signature and do not have symbols. I used Claude/Gemini for such tasks and they were pretty good at it.

As a side note, I recall Kevin Gosse also implemented a WinDbg extension [1][2] which used OpenAI API to interpret the debugger command output.

[1] https://x.com/KooKiz/status/1641565024765214720

[2] https://github.com/kevingosse/windbg-extensions

JanneVee · 2h ago
> Crash dump analysis has traditionally been one of the most technically demanding and least enjoyable parts of software development.

I for one enjoy crashdump analysis because it is a technically demanding rare skill. I know I'm an exception but I enjoy actually learning the stuff so I can deterministically produce the desired result! I even apply it to other parts of the job, like learning to currently used programming language and actually reading the documentation libraries/frameworks, instead of copy pasting solutions from the "shortcut du jour" like stack overflow yesterday and LLMs of today!

criddell · 14m ago
Are you using WinDbg? What resources did you use to get really good at it?

Analyzing crash dumps is a small part of my job. I know enough to examine exception context records and associated stack traces and 80% of the time, that’s enough. Bruce Dawson’s blog has a lot of great stuff but it’s pretty advanced.

I’m looking for material to help me jump that gap.

Tepix · 3h ago
Sounds really neat!

How does it compare to using the Ghidra MCP server?

cjbprime · 2h ago
Ghidra's a decompiler and WinDBG is a debugger, so they'd be complementary.
cadamsdotcom · 5h ago
Author built an MCP server for windbg: https://github.com/svnscha/mcp-windbg

Knows plenty of arcane commands in addition to the common ones, which is really cool & lets it do amazing things for you, the user.

To the author: most of your audience knows what MCP is, may I suggest adding a tl;dr to help people quickly understand what you've done?

indigodaddy · 5h ago
My word, that's one of the most beautiful sites I've ever encountered on mobile.

No comments yet

Zebfross · 5h ago
Considering AI is trained on the average human experience, I have a hard time believing it would be able to make any significant difference in this area. The best experience I’ve had debugging at this level was using Microsoft’s time travel debugger which allows stepping forward and back.
cjbprime · 2h ago
You should try AI sometime. It's quite good, and can do things (like "analyze these 10000 functions and summarize what you found out about how this binary works, including adding comments everywhere) that individual humans do not scale to.
voidspark · 1h ago
It can analyze a crash dump in 2 seconds, that could take hours for an experienced developer, or impossible for the "average human".