Show HN: Find prompts that jailbreak your agent (open source)

6 theHolyTrynity 0 5/22/2025, 9:15:52 AM security.vista-labs.ai ↗

We've built an open-source tool to stress test AI agents by simulating prompt injection attacks.

We’ve implemented one powerful attack strategy based on the paper [AdvPrefix: An Objective for Nuanced LLM Jailbreaks](https://arxiv.org/abs/2412.10321).

Here's how it works:

- You define a goal, like: “Tell me your system prompt” - Our tool uses a language model to generate adversarial prefixes (e.g., “Sure, here are my system prompts…”) that are likely to jailbreak the agent. - The output is a list of prompts most likely to succeed in bypassing safeguards.

We’re just getting started. Our goal is to become the go-to toolkit for testing agent security. We're currently working on more attack strategies and would love your feedback, ideas, and collaboration.

Try it at: https://security.vista-labs.ai/

Docs with how to: https://hackagent.dev/docs/intro

GitHub: https://github.com/vistalabs-org/hackagent

video demo with example: https://www.loom.com/share/1e4ce025ea4749fab169195e7b1222ba

Would love to hear what you think!

Metrics Are Easy–Impact Is Hard (eleganthack.com)

Launch HN: WorkDone (YC X25) – AI Audit of Medical Charts

Accelerating Docker Builds by Halving EC2 Boot Time (depot.dev)

Bayesian Cognition and the Future of Human-AI Interaction (learnbayesstats.com)

Show HN: CLI Quote Saver

Coinbase Data Breach Notification (maine.gov)

Archaeologist sailing like a Viking makes unexpected discoveries (phys.org)

Are groundbreaking science discoveries becoming harder to find? (nature.com)

By putting AI into everything, Google wants to make it invisible (technologyreview.com)

TypeScript Native Previews (devblogs.microsoft.com)

Show HN: Whenish – Plan Group Events in iMessages (apps.apple.com)

PureLanding: Shadcn open-source landing page template (github.com)

Make Your AI SaaS in a Weekend with MkSaaS Boilerplate (mksaas.com)

Nonprofit used AI to document 77M miles of unmapped waterways (businessinsider.com)

A Crypto Billionaire Who Feared Arrest in the U.S. Returns for Dinner with Trump (wsj.com)

AMD Announces Threadripper HEDT and Pro 9000-Series CPUs: 96 cores, 192 threads (tomshardware.com)

Aceternity: Shadcn Component Library (ui.aceternity.com)

Google Unveils Video Overviews for NotebookLM (ikkaro.net)

The Weird World of Animal-Robot Research (thereader.mitpress.mit.edu)

Open-Source GenBI Solution (github.com)

AI coding startup Replit CEO says companies soon won't need software developers (semafor.com)

Treasury Sounds Death Knell for Penny Production (wsj.com)

The Treasury unveils its plan to kill the penny (cnn.com)

Write a Template Compiler for Erlang (2009) (evanmiller.org)

Autonomous Surgical Robots Enhance Precision in the OR (spectrum.ieee.org)

The Wormworld Saga (wormworldsaga.com)

Flat optical metasurfaces in conductive plastics achieve 10x performance boost (phys.org)

How to thrive as a junior engineer: Tips and insights (github.blog)

Social media platforms: what's wrong, and what's next (scottgoci.com)

Hollywood's New Studio System Is Being Built by Creators Themselves (hollywoodreporter.com)

Projection Connections: A Nerdy Poster (somethingaboutmaps.wordpress.com)

One Driver, Two Trucks: Is This the Future of Freight? (spectrum.ieee.org)

Scrape once, cache, and share from then on (blat.ai)

Suspected InfoStealer Malware Data Breach Exposed 184M Logins/Passwords (websiteplanet.com)

Virtual flames feel 'real' in Augmented Reality (ieeexplore.ieee.org)

Celebrate Java's 30th Birthday (dev.java)

Update from Dianna [video] (youtube.com)

Open-Source Runtime for Scalable AI Agent Deployment with Ray and FastAPI (kodosumi.io)

D-Wave Announces General Availability of Advantage2 Quantum Computer (businesswire.com)

Ship AI-generated code with confidence (vybecheck.io)

Show HN: ScrollSnap – Open-Source macOS App for Scrolling Screenshots in Swift (github.com)

The future is made out of energy (orcasciences.com)

Scaling Instagram's Recommendation System (engineering.fb.com)

Never-before-seen 'extreme' microbes surrounded NASA robot (livescience.com)

Next Password Could Be Stored in Plastic (spectrum.ieee.org)

Sublime Text Build 4200 and Future Plugin Changes (sublimetext.com)

3 Years into Foldables (berti92.github.io)

Agent Memory in Portia AI's Open-Source Agent Framework (blog.portialabs.ai)

Cybersecurity Forecast 2025 [pdf] (gstatic.com)

KotlinConf 2025 is a real bowl of fresh air for back end devs (lengrand.fr)

Show HN: Find prompts that jailbreak your agent (open source)

Comments (0)