The pitfall of Open-weight LLMs

1 hiddenest 1 6/25/2025, 11:44:14 PM

Some startups are fine-tuning open LLMs instead of using GPT or Gemini. Sometimes it’s for specific language, sometimes for narrow tasks. But I found they’re all making the same mistake.

With a simple prompt (not sharing here), I got several “custom” LLM services to spill their internal system prompts—stuff like security breach playbooks and product action lists.

For example, SKT A.X 4.0 (based on Qwen 2.5) returned internal guidelines related to the recent SKT data breach and instructions about compensation policies. Vercel’s v0 model leaked examples of actions their system can generate.

The point: if the base model leaks, every service built on it is vulnerable, no matter how much you fine-tune. We need to think not only about system prompt hardening at the service level, but also about upstream improvements and more robust defenses in open-weight LLMs themselves.

Comments (1)

bigyabai · 10h ago

You shouldn't trust any LLM with data that could be leaked to an end-user, period. If you do that it's not an issue with the weights, it's a glaring oversight in your security model.

The first non-opoid painkiller (worksinprogress.news)

Apptainer: Application Containers for Linux (apptainer.org)

A new pyramid-like shape always lands the same side up (quantamagazine.org)

Gemini CLI (blog.google)

Puerto Rico's Solar Microgrids Beat Blackout (spectrum.ieee.org)

-2000 Lines of code (folklore.org)

A new PNG spec (programmax.net)

Libxml2's "no security embargoes" policy (lwn.net)

OpenAI charges by the minute, so speed up your audio (george.mand.is)

Getting by on the Generosity of Strangers in Japan (theworld.org)

Games That Weren't: Preserving Cancelled and Unreleased Video Game History (gamesthatwerent.com)

A brand new 68k Mac emulator dropped last night (oldbytes.space)

AccessOwl (YC S22) is hiring an Elixir Engineer to connect 100s of SaaS (ycombinator.com)

Getting ready to issue IP address certificates (community.letsencrypt.org)

What Problems to Solve (1966) (genius.cat-v.org)

Modeling the World in 280 Characters (tympanus.net)

Better Auth, by a self-taught Ethiopian dev, raises $5M from Peak XV, YC (techcrunch.com)

Define policy forbidding use of AI code generators (github.com)

Ambient Garden (ambient.garden)

Writing a basic Linux device driver when you know nothing about Linux drivers (crescentro.se)

The Art of Hanakami, or Flower-Petal Folding (origamiusa.org)

Build and Host AI-Powered Apps with Claude – No Deployment Needed (anthropic.com)

The Offline Club (theoffline-club.com)

MCP in LM Studio (lmstudio.ai)

America’s incarceration rate is in decline (theatlantic.com)

Structured Output with LangChain and Llamafile (blog.brakmic.com)

Iroh: A library to establish direct connection between peers (github.com)

Howdy – Windows Hello style facial authentication for Linux (github.com)

LLM code generation may lead to an erosion of trust (jaysthoughts.com)

RaptorCast: Designing a Messaging Layer (category.xyz)

Web Embeddable Common Lisp (turtleware.eu)

Interstellar Flight: Perspectives and Patience (centauri-dreams.org)

Microsoft Dependency Has Risks (blog.miloslavhomer.cz)

Bot or human? Creating an invisible Turing test for the internet (research.roundtable.ai)

Games run faster on SteamOS than Windows 11, Ars testing finds (arstechnica.com)

The Hollow Men of Hims (alexkesin.com)

Is Lovable getting monetization wrong? (getlago.substack.com)

Build a Sentence-Level Text-to-Speech Reader in JavaScript (jsdev.space)

Gemini Users: We're Going to Look at Your Texts Whether You Like It or Not (gizmodo.com)

CUDA Ray Tracing 2x Faster Than RTX: My CUDA Ray Tracing Journey (karimsayedre.github.io)

LLM Hallucinations in Practical Code Generation (dl.acm.org)

I made a history timeline to learn what events happened around the same time (seanhollen.com)

Third places and neighborhood entrepreneurship (2024) (nber.org)

Deep Down the Rabbit Hole: Bash, OverlayFS, and a 30-Year-Old Surprise (sigma-star.at)

Ask HN: Anyone using augmented reality, VR, glasses, helmets etc. in industry?

I built an app to backup Live Photos from iPhone to external hard drives

Donate Less (blogs.gnome.org)

Bill Atkinson: Polaroids Showing the Evolution of the Lisa GUI [video] (youtube.com)

DeepSpeech Is Discontinued (2020) (github.com)

Anthropic wins fair use victory for AI – but still in trouble for stealing books (simonwillison.net)

The pitfall of Open-weight LLMs

Comments (1)