Verification, the Key to AI (2001)

34 anjneymidha 5 5/9/2025, 5:10:28 AM incompleteideas.net ↗

Comments (5)

jrvarela56 · 5h ago
This applies to coding agents. If the agent can't run the code, it's unlikely that it can produce working code. Add to running: linting, running tests, compiling, code review and any other tool/process humans do to check if software is 'good' or working.

If the agent can apply these processes to the output, then we're on our way to getting good chunk of our work done for us. Even from the product pov, if the agent is allowed to experiment by making deployments and check user-facing metrics, it eventually could build software product - but we should still solve the coding part as it seems easier to objectively verify quickly.

jgalt212 · 3h ago
You're right, but actually running the code can be destructive (even when run as intended). You really need to be careful about dev environments. Even the destructive operations will cost you time (and money) in resetting the dev environment.
jbellis · 3h ago
Wow, 2001. Legitimately prescient.

And verification ("evaluation" we call it now) really is the key, although most people working on "AI apps" haven't figured it out yet.

Follow Hamel to catch up on the state of the art: https://x.com/HamelHusain

a3w · 6h ago
Nice. LLMs can prove barely anything, providing some sources, or doing pure math that already circulates. AFAICT, so far, no novel ideas have been proven, i.e. the "these systems never invented anything"-paradox for three years now.

Symbolic AI seems to prove everything it states, but never novel ideas, either.

Let's see if we get neurosymbolic AI that can do something both could not do on their own — I doubt it, AI might just be a doom cult after all.

tasuki · 4h ago
You can use an external proving mechanism and feed the results to the LLM.

A sufficiently rich type system (think Idris rather than C) or a sufficiently powerful test suite (eg property-based tests) should do the trick.