Thank you to Tom Warren (and The Verge) for specifying "open weight model" in paragraph two, and for going into more detail on what we know and don't know about what that means in paragraph nine.
> "OpenAI is preparing to announce the language model as an “open model,” but that terminology, which often gets confused with open-source, is bound to generate a lot of debate around just how open it is. That will all come down to what license is attached to it and whether OpenAI is willing to provide full access to the model’s code and training details, which can then be fully replicated by other researchers."
Legend2440 · 20h ago
'open source' isn't even a meaningful word for LLMs, since they don't have source code in the traditional sense.
vrighter · 20h ago
yes they do. training code, dataset, and random seed. given all three, you can recreate the final model (the weights). The set of all three is the source code
lavoiems · 20h ago
That and random race conditions occurring during training.
herbst · 20h ago
If I could theoretically fully rebuild it based on the given data I would call it open source. See the upcoming ETH model, it should be completely reproduceable
> "OpenAI is preparing to announce the language model as an “open model,” but that terminology, which often gets confused with open-source, is bound to generate a lot of debate around just how open it is. That will all come down to what license is attached to it and whether OpenAI is willing to provide full access to the model’s code and training details, which can then be fully replicated by other researchers."