I've found the opposite of this to be the case. I've spent some time recently debugging a chatbot - when it goes astray I say 'what part of your system prompt made you do that' and it responds with the line that drove this decision. Then I clarify my intent and ask for better phrasing, and fix it. This usually works.
This also often works with tool use and tool cause - just ask it what part of its prompt told it to do something and it can usually point there.
If you ask it why it thought something a priori was wrong, then the bot can't answer it, and neither can I. If you ask me to clarify why I wrote some code I can walk you through the steps I got there. But if you ask me to clarify why I believed a function exists, that upon runtime I learned doesn't actually exist, I can't provide justification there.
This also often works with tool use and tool cause - just ask it what part of its prompt told it to do something and it can usually point there.
If you ask it why it thought something a priori was wrong, then the bot can't answer it, and neither can I. If you ask me to clarify why I wrote some code I can walk you through the steps I got there. But if you ask me to clarify why I believed a function exists, that upon runtime I learned doesn't actually exist, I can't provide justification there.