Differences in link hallucination and source comprehension across different LLM (mikecaulfield.substack.com)

Well over a decade ago, I recall learning about a covert data exfiltration method that could bypass firewalls by using DNS lookups. The payload would be a base64 hostname prefix attached to an evil domain. Adding a time stamp to the prefix data would guarantee uniqueness, and get around local caching DNS servers.

acmiyaguchi · 4h ago

The idea of using stenographic techniques to exfiltrate data is interesting, but I don't quite follow the general method outlined in the repository -- either through the generated documentation or code. The threat model and case studies seem contrived. I find it hard to believe that folks would expose data via RAG that they wouldn't want users of the underlying system to be privy to.

There's too much fluff here to be useful. I imagine having something that is concise and concrete would make it more appealing to others. But as-is, it's missing a good technical summary and demonstration.

smugglereal · 2h ago

Thanks for the feedback!

It's less about the RAG exposing new data to a regular user, and more about using the vector pipeline as a covert channel. The idea is to sneak out data the attacker already can access, but in a way that might bypass traditional DLP looking at emails, USBs, etc.

The "fluff" is largely educational material, as the project is for research and learning. For a concrete technical demonstration, the scripts/embed.py and scripts/query.py scripts are the core, and the docs/guides/quick_start.md tries to offer a direct path to seeing it in action.

Hope that helps! Will add a video demo soon.

smugglereal · 8h ago

A comprehensive proof-of-concept demonstrating sophisticated vector-based data exfiltration techniques in AI/ML environments. This educational security research project illustrates potential risks in RAG systems and provides tools for defensive analysis.

Show HN: I made a 3D SVG Renderer that projects textures without rasterization (seve.blog)

Engineer Fixes and Re-Installs Old Payphones, Provides Free Calls to the Public (core77.com)

FFmpeg merges WebRTC support (git.ffmpeg.org)

A proposal to restrict sites from accessing a users’ local network (github.com)

Why I wrote the BEAM book (happihacking.com)

A Spiral Structure in the Inner Oort Cloud (iopscience.iop.org)

After court order, OpenAI is now preserving all ChatGPT user logs (mastodon.laurenweinstein.org)

Cursor 1.0 (cursor.com)

Autonomous drone defeats human champions in racing first (tudelft.nl)

Want to Model a Land Value Tax Shift in Your City? (progressandpoverty.substack.com)

Apple Notes Expected to Gain Markdown Support in iOS 26 (macrumors.com)

The iPhone 15 Pro’s Depth Maps (tech.marksblogg.com)

The Prompt Engineering Playbook for Programmers (addyo.substack.com)

Tesla seeks to guard crash data from public disclosure (reuters.com)

LLMs and Elixir: Windfall or Deathblow? (zachdaniel.dev)

Authentication with Axum (mattrighetti.com)

Not All Tokens Are Meant to Be Forgotten (arxiv.org)

PromptArmor (YC W24) Is Hiring in San Francisco (ycombinator.com)

Differences in link hallucination and source comprehension across different LLM (mikecaulfield.substack.com)

parrot.live (github.com)

NoteGen is a cross-platform Markdown note-taking application (github.com)

Ada and SPARK enter the automotive ISO-26262 market with Nvidia (adacore.com)

IRS Direct File on GitHub (chrisgiven.com)

A practical guide to building agents [pdf] (cdn.openai.com)

Amelia Earhart's Reckless Final Flights (newyorker.com)

When memory was measured in kilobytes: The art of efficient vision (softwareheritage.org)

Foam: A free Roam alternative for VSCode (github.com)

Show HN: GPT image editing, but for 3D models (adamcad.com)

Comparing Claude System Prompts Reveal Anthropic's Priorities (dbreunig.com)

AGI is not multimodal (thegradient.pub)

Show HN: App.build, an open-source AI agent that builds full-stack apps (app.build)

How we reduced the impact of zombie clients (letsencrypt.org)

Autopen Guide (astroautopens.com)

VectorSmuggle: Covertly Exfiltrate Data in Embeddings (github.com)

Redesigned Swift.org is now live (swift.org)

Is This the End or the Beginning? (lichess.org)

Doubling Down on Open Source (langfuse.com)

Arthur C. Clarke predicted a computer-dominated future in the ’70s (2024) (openculture.com)

The Right to Repair Is Law in Washington State (eff.org)

Cockatoos have learned to operate drinking fountains in Australia (science.org)

Machine Code Isn't Scary (jimmyhmiller.com)

Dr. Sbaitso (classicreload.com)

Flight Simulator Gave Birth to 3D Video-Game Graphics (spectrum.ieee.org)

Cursor v1.0 (forum.cursor.com)

'Bohemian Rhapsody': The Story Behind Queen's Rule-Breaking Classic Song (udiscovermusic.com)

Cloud Run GPUs, now GA, makes running AI workloads easier for everyone (cloud.google.com)

VC money is fueling a global boom in worker surveillance tech (restofworld.org)

Copilot Spaces: A new way to work with code and context (github.blog)

From Steam to Silicon: Patterns of Technological Revolutions (ianreppel.org)

Ask HN: Startup getting spammed with PayPal disputes, what should we do?

VectorSmuggle: Covertly Exfiltrate Data in Embeddings

Comments (4)