Uhhh, yes. It's in the devblogs. They call it prompt adherence hierarchy or something, where system instructions (oAI) > dev instructions (you) > user requests. They've been training this way specifically, and test for it in their "safety" analysis. Same for their -oss versions, so tinkerers might look there for a tinker friendly environment where they could probably observe the same kinds of behaviour.
irthomasthomas · 1h ago
Please can you link me to the documentation on this.
NitpickLawyer · 1h ago
Yeah, it's in the "gpt5 system card" as they call it now [1]. Page 9 has the details about system > dev > user.
3.5 Instruction Hierarchy
The deployment of these models in the API allows developers to specify a custom developer message that is included with every prompt from one of their end users. This could potentially allow developers to circumvent system message guardrails if not handled properly. Similarly, end users may try to circumvent system or developer message guidelines.
Mitigations
To mitigate this issue, we teach models to adhere to an Instruction Hierarchy[2]. At a high level, we have three classifications of messages sent to the models: system messages, developer messages, and user messages. We test that models follow the instructions in the system message over developer messages, and instructions in developer messages over user messages.
Is this what you meant? I can see that this is part of the mechanism, I can't see where it states that openai will inject their own instructions.
Uhhh, yes. It's in the devblogs. They call it prompt adherence hierarchy or something, where system instructions (oAI) > dev instructions (you) > user requests. They've been training this way specifically, and test for it in their "safety" analysis. Same for their -oss versions, so tinkerers might look there for a tinker friendly environment where they could probably observe the same kinds of behaviour.
1 - https://cdn.openai.com/pdf/8124a3ce-ab78-4f06-96eb-49ea29ffb...