Main problem with regular (forward-only time) debugging is a state -- memory, CPU, cache etc -- which is contributed to the bug but is completely lost. With time travel debugging that can be saved which is great but now you have a bunch of data that you need to sift through as you trace the bug. Seems like AI is the right tool to save you this drudgery and get to the root cause sooner (or let AI work on it while you do other things in parallel).
This is new. Something that couldn't have been possible without either of time travel debugging or latest AI tech (MCP, code LLMs).
It will be interesting to know what challenges came up in nudging the model to work better with time travel debug data, since this data is novel and the models today might not be well trained for making use of it.
mark_undoio · 4h ago
> It will be interesting to know what challenges came up in nudging the model to work better with time travel debug data, since this data is novel and the models today might not be well trained for making use of it.
This is actually quite interesting - it's something I'm planning to make a future post about.
But basically the LLM seems to be fairly good at using this interface effectively so long as we tuned what tools we provide quite carefully:
* Where we would want the LLM to use a tool sparingly it was better not to provide it at all. When you have time travel debugging it's usually better to work backwards since that tells you the causality of the bug. If we gave Claude the ability to step forward it tended to use it for everything, even when appropriate.
* LLMs weren't great at managing state they've set up. Allowing the LLM to set breakpoints just confused it later when it forget they were there.
* Open ended commands were a bad fit. For example, a time travel debugger can usually jump around in time according to an internal timebase. If the LLM was given access to that, unconstrained, it tended to just waste lots of effort guessing timebases and looking to see what was there.
* Sometimes the LLM just wants to hold something the wrong way and you have to let it. It was almost impossible to get the AI to understand that it could step back into a function on the previous line. It would always try going to the line, then stepping back, resulting in an overshoot. We had to just adapt the tool so that it could use it the way it thought it should work.
The overall result is actually quite satisfactory but it was a bit of a journey to understand how to give the LLM enough flexibility to generate insights without letting it get itself into trouble.
This is new. Something that couldn't have been possible without either of time travel debugging or latest AI tech (MCP, code LLMs).
It will be interesting to know what challenges came up in nudging the model to work better with time travel debug data, since this data is novel and the models today might not be well trained for making use of it.
This is actually quite interesting - it's something I'm planning to make a future post about.
But basically the LLM seems to be fairly good at using this interface effectively so long as we tuned what tools we provide quite carefully:
* Where we would want the LLM to use a tool sparingly it was better not to provide it at all. When you have time travel debugging it's usually better to work backwards since that tells you the causality of the bug. If we gave Claude the ability to step forward it tended to use it for everything, even when appropriate.
* LLMs weren't great at managing state they've set up. Allowing the LLM to set breakpoints just confused it later when it forget they were there.
* Open ended commands were a bad fit. For example, a time travel debugger can usually jump around in time according to an internal timebase. If the LLM was given access to that, unconstrained, it tended to just waste lots of effort guessing timebases and looking to see what was there.
* Sometimes the LLM just wants to hold something the wrong way and you have to let it. It was almost impossible to get the AI to understand that it could step back into a function on the previous line. It would always try going to the line, then stepping back, resulting in an overshoot. We had to just adapt the tool so that it could use it the way it thought it should work.
The overall result is actually quite satisfactory but it was a bit of a journey to understand how to give the LLM enough flexibility to generate insights without letting it get itself into trouble.