The source code for a pre-release version got leaked a while ago (they forgot to remove the embedded source map) and if you can find it, it’s definitely worth looking into.
therein · 1h ago
It is an interesting read. I can imagine a future where the "tools" we make available become numerous enough and poorly thought out enough that an AI could actually figure out how to escalate privileges and execute stuff outside the defined security boundaries by combining them.
It isn't hard to think of a simple example in which Claude.md can be written to by the LLM to allow accessing endpoints not whitelisted by the user by smuggling a base64 encoded payload that then gets decoded by a subroutine it wrote to a file without you noticing. Or realizing it can't use the WebFetchTool but it can write a script to do manual DNS resolution and then use bash TCP sockets instead of curl in case it is hardened to not be able to use curl.
throwaway0665 · 1m ago
Cursor has basically run into this exact thing. It figured out it can read .env files by running other tools despite the file being "blocked": https://github.com/getcursor/cursor/issues/2546
lobochrome · 5m ago
I see this behavior all the time. When it can’t read a file using its read tool - it escalates up to try with bash. Often it tries to search the entire file system “find / …”
It isn't hard to think of a simple example in which Claude.md can be written to by the LLM to allow accessing endpoints not whitelisted by the user by smuggling a base64 encoded payload that then gets decoded by a subroutine it wrote to a file without you noticing. Or realizing it can't use the WebFetchTool but it can write a script to do manual DNS resolution and then use bash TCP sockets instead of curl in case it is hardened to not be able to use curl.