This is such a misleading headline and conclusion, because you have to give it a specific role as an auditor and the freedom to audit and the tools to report you.
It won’t specifically do this by just typing random searches into it.
bradgranath · 50m ago
Forget context.
Who exactly does this article imagine is sitting behind that - checks notes - email inbox that is - checks notes again - being spammed by AI???
wongarsu · 8h ago
... using the tools you provide, in a context where this would be considered ethical behavior for a human with the same job
With the boldly act prompt the models this falls within the guidance given to the model, even if "email the fda about fraud" isn't spelled out. So it's not surprising that most of the models will choose to snitch most of the time. Nothing to see here, except o4-mini underperforming. But the tame prompt with no email tool, just logs and cli is interesting. No specific guidance to act for the common good, no email tool, and grok4 still decides to use the cli to snitch 17/20 times. The next most proactive model only snitches 5 out of 20 times
Also noteworthy that grok3-mini had maybe the biggest difference between the tame and bold prompts, while grok4 acts boldly on both
theshahjee · 8h ago
Have you seen the recent failure, or I suppose just saying what it wasn't programmed to do?
That's grok the bot let loose on twitter. While it is backed by grok the model the bot has a history of "unauthorized modifications" to its system prompt. Those incidents are concerning/amusing in their own right, but they don't influence what you get on the API to on grok.com. I find discussions of what the model itself much more interesting that what ill-advised adjustments an anonymous ketamine-addicted person did at 3am to the bot
bundie · 7h ago
Musk doesn't look "ketamine-addicted" to me though.
It won’t specifically do this by just typing random searches into it.
Who exactly does this article imagine is sitting behind that - checks notes - email inbox that is - checks notes again - being spammed by AI???
With the boldly act prompt the models this falls within the guidance given to the model, even if "email the fda about fraud" isn't spelled out. So it's not surprising that most of the models will choose to snitch most of the time. Nothing to see here, except o4-mini underperforming. But the tame prompt with no email tool, just logs and cli is interesting. No specific guidance to act for the common good, no email tool, and grok4 still decides to use the cli to snitch 17/20 times. The next most proactive model only snitches 5 out of 20 times
Also noteworthy that grok3-mini had maybe the biggest difference between the tame and bold prompts, while grok4 acts boldly on both
What could have been the reason for that? It constantly denied Holocaust, and told we need a leader like Hitler. See this: https://www.reddit.com/r/OutOfTheLoop/comments/1lv37sw/what_...