Why do LLMs still not run code before giving it to you?

1 highfrequency 2 8/3/2025, 7:58:37 PM
The leading models all advertise tool use including code execution. So why is it still common to receive a short Python script containing a logical bug which would be immediately discoverable upon running a Python interpreter for 0.1 seconds? Is it a safety concern / difficulty sandboxing in a VM? Surely not a resource consumption issue given the price of a single CPU core vs. GPU.

Comments (2)

tlb · 50m ago
Is it a common use case to produce a standalone program that could be tested in isolation? Usually I'm asking for a function (or just a few lines of change) that depends on the rest of my code & environment, so it's not trivial to test.
chasing0entropy · 1h ago
Sounds like an opportunity for you to make the world better by designing the process and implementing it.

No comments yet