Indirect Prompt Injection Attacks Against LLM Assistants

I've been comparing LLMs to asbestos:

1. The thing has some amazing and near-irreplaceable aspects.

2. There are a few ways to use it that are safe or worth the risk.

3. ... But for someone to get filthy rich selling it everywhere, that means lots of horribly dangerous applications that will cause insidious long-term damage that other people will be stuck with.

_________

For example, researchers have found some LLMs will generate documents where a character faces harsher punishment, solely because of the grammar in the character's dialogue which is associated with a particular ethnicity.

While I would damn well hope we never use LLMs as legal judges, imagine the same pattern when people are using them to summarize your resume, to characterize you college application, or even screen a dating profile, and how those effects can add up to change the arc of victims' lives and livelihoods.

Some quick napkin-math: Imagine 100m adult Americans exist who have an average lifetime earnings of $1.7m. If LLM bias affects just 1% of them (1m) causing them to lose 1% of their earnings ($17k) that's still 17 billion dollars in harm.

Indirect Prompt Injection Attacks Against LLM Assistants

Comments (2)