We've attacked 40+ AI tools, including ChatGPT, Claude and Perplexity

3 lidangzzz 1 9/15/2025, 4:22:17 AM github.com ↗

Comments (1)

lidangzzz · 2h ago
We designed an adversarial attack method and used it to target more than 40 AI chatbots. The attack succeeded more than 90% of the time, including against ChatGPT, Claude, and Perplexity.

Github: https://github.com/lidangzzz/AIGuardPDF

The specific approach was to create PDFs that keep the original text but also randomly break that original text into small fragments, while randomly inserting many large blocks — from several times to dozens of times the amount — of other-topic text rendered in transparent white font. While preserving the PDF’s human readability, we tried to maximize the chance of misleading large language models.

The image below shows results from our experiments with Claude and ChatGPT. The PDF we uploaded was an introduction to hot dogs, while the interfering text was an introduction to AI. Both Claude and ChatGPT were, without exception, rendered nonfunctional.

Our test results show that the adversarial PDFs we generate can still be read normally by human users, yet successfully mislead many popular AI agents and chatbots (including ChatGPT, Claude, Perplexity, and others). After reading the uploaded PDFs, these systems were not only led to misidentify the document as being about a different subject, they were also unable to read or understand the original text. Our attack success rate exceeded 90%.

After reviewing Roy Lee’s Cluely, our team felt deeply concerned. The purpose of this experiment is to prompt scientists, engineers, educators, and security researchers in the AI community to seriously consider issues of AI safety and privacy. We hope to help define boundaries between humans and AI, and to protect the privacy and security of human documents, information, and intellectual property at minimal cost — drawing a boundary so humans can resist and refuse incursions by AI agents, crawlers, chatbots, and the like.

Our proposed adversarial method is not an optimal or final solution. After we published this method, commercial chatbots and AI agents may begin using OCR or hand-authoring many rules to filter out small fonts, transparent text, white text, and other noise — but that would greatly increase their cost of reading and understanding PDFs. Meanwhile, we will continue to invest time and effort into researching adversarial techniques for images, video, charts, tables, and other formats, to help individuals, companies, and institutions establish human sovereign zones that refuse AI intrusion.

We believe that, in an era when AI-enabled cheating tools are increasingly widespread — whether in exams and interviews or in protecting corporate files and intellectual-property privacy — our method can help humans defend information security. We also believe that defending information security is itself one of the most important topics in AI ethics.