Thank you to Tom Warren (and The Verge) for specifying "open weight model" in paragraph two, and for going into more detail on what we know and don't know about what that means in paragraph nine.
> "OpenAI is preparing to announce the language model as an “open model,” but that terminology, which often gets confused with open-source, is bound to generate a lot of debate around just how open it is. That will all come down to what license is attached to it and whether OpenAI is willing to provide full access to the model’s code and training details, which can then be fully replicated by other researchers."
Legend2440 · 4h ago
'open source' isn't even a meaningful word for LLMs, since they don't have source code in the traditional sense.
vrighter · 4h ago
yes they do. training code, dataset, and random seed. given all three, you can recreate the final model (the weights). The set of all three is the source code
lavoiems · 4h ago
That and random race conditions occurring during training.
herbst · 4h ago
If I could theoretically fully rebuild it based on the given data I would call it open source. See the upcoming ETH model, it should be completely reproduceable
> "OpenAI is preparing to announce the language model as an “open model,” but that terminology, which often gets confused with open-source, is bound to generate a lot of debate around just how open it is. That will all come down to what license is attached to it and whether OpenAI is willing to provide full access to the model’s code and training details, which can then be fully replicated by other researchers."