Open models by OpenAI (openai.com)

Or is there some way to cache the state of the LLM after processing this prompt, before the first user token is received, and every request starts from this cached state?

mdaniel · 3h ago

My understanding is that is what the KV cache does in models serving. I would imagine they'd want to prime any such KV cache with common tokens but retain a per-session cache to avoid leaks. It seems HF agrees with the concept, at least https://huggingface.co/docs/transformers/kv_cache#prefill-a-...

mdaniel · 4h ago

It's a good thing people were enamored of how inexpensive GPT-5 is, given that the system prompt is (allegedly) 54kb. I don't know how many tokens that is offhand, but what a lot of them to burn just on setup of the thing

btdmaster · 3h ago

I might be wrong, but can't you checkpoint the post-system prompt model and restore from there, trading memory for compute? Or is that too much extra state?

mdaniel · 3h ago

My mental model is that the system prompt isn't one thing, and that seems even more apparent with line 6 telling the model what today's date is. I have no insider information but system prompts could undergo A/B testing just like any change, to find the optimal one for some population of users

Which is to say you wouldn't want to bake such a thing too deeply into a multi-terabyte bunch of floating points because it makes operating things harder

Tadpole9181 · 2h ago

54,000 bytes, one byte per character. 4 characters per token (more or less). Around 13,000 tokens.

These are NOT included in the model context size for pricing.

TZubiri · 4h ago

These are always so embarassing

NewsaHackO · 4h ago

It's because they always put things that seem way to specific to certain issues, like riddles and arithmetic. Also, I am not a WS, but the mention of "proud boys" are things that can be used as fodder for LLM bias. I wonder why they even have to use a system prompt; why can't that have a separate fine-tuned model for ChatGPT specifically so that they don't need a system prompt?

Open models by OpenAI (openai.com)

GPT-5 (openai.com)

Genie 3: A new frontier for world models (deepmind.google)

Perplexity is using stealth, undeclared crawlers to evade no-crawl directives (blog.cloudflare.com)

uBlock Origin Lite now available for Safari (apps.apple.com)

Show HN: I spent 6 years building a ridiculous wooden pixel display (benholmen.com)

Ultrathin business card runs a fluid simulation (github.com)

I want everything local – Building my offline AI workspace (instavm.io)

Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model (github.com)

Emailing a one-time code is worse than passwords (blog.danielh.cc)

Things that helped me get out of the AI 10x engineer imposter syndrome (colton.dev)

Modern Node.js Patterns (kashw1n.com)

Vibechart (vibechart.net)

Claude Opus 4.1 (anthropic.com)

I gave the AI arms and legs then it rejected me (grell.dev)

Claude Code IDE integration for Emacs (github.com)

GPT-5: Key characteristics, pricing and system card (simonwillison.net)

Job-seekers are dodging AI interviewers (fortune.com)

Monitor your security cameras with locally processed AI (frigate.video)

Mastercard deflects blame for NSFW games being taken down (pcgamer.com)

Writing a good design document (grantslatton.com)

Jim Lovell, Apollo 13 commander, has died (nasa.gov)

Qwen-Image: Crafting with native text rendering (qwenlm.github.io)

Historical Tech Tree (historicaltechtree.com)

Tesla withheld data, lied, misdirected police to avoid blame in Autopilot crash (electrek.co)

Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?

Cursed Knowledge (immich.app)

Objects should shut up (dustri.org)

Flipper Zero dark web firmware bypasses rolling code security (rtl-sdr.com)

GPT-5 for Developers (openai.com)

US reportedly forcing TSMC to buy 49% stake in Intel to secure tariff relief (notebookcheck.net)

How we made JSON.stringify more than twice as fast (v8.dev)

Getting good results from Claude Code (dzombak.com)

Linear sent me down a local-first rabbit hole (bytemash.net)

Japan: Apple Must Lift Browser Engine Ban by December (open-web-advocacy.org)

PHP 8.5 adds pipe operator (thephp.foundation)

Ollama Turbo (ollama.com)

Ozempic shows anti-aging effects in trial (trial.medpath.com)

Scientific fraud has become an 'industry,' analysis finds (science.org)

Windows XP Professional (win32.run)

So you want to parse a PDF? (eliot-jones.com)

The surprise deprecation of GPT-4o for ChatGPT consumers (simonwillison.net)

Show HN: The current sky at your approximate location, as a CSS gradient (sky.dlazaro.ca)

OpenAI's new open-source model is basically Phi-5 (seangoedecke.com)

DrawAFish.com Postmortem (aldenhallak.com)

Exit Tax: Leave Germany before your business gets big (eidel.io)

Tor: How a military project became a lifeline for privacy (thereader.mitpress.mit.edu)

Harmony: OpenAI's response format for its open-weight model series (github.com)

Cursor CLI (cursor.com)

Ask HN: What trick of the trade took you too long to learn?

GPT-5 System Prompt

Comments (9)