Sycophancy is noticeably higher and a couple tests on domains where I can assess output quality from my expertise (outlay a parts buyer workflow given some proprietary details, explain why measures can't distinguish between two given countable subsets of the transcendentals, write a contrarian defense of Thrasymachus, show how the SEC phase of UEFI boot changed from pre-8 to 8 to 10 and 11) gave no difference in quality.
I'm gonna stick with v3-0324 and I recommend that others do the same.
martianlantern · 4h ago
Is there any benchmarks and comparisons compared to gpt-oss? I believe it far exceeds gpt oss or even gpt5 otherwise they wounldn't have released it
guluarte · 12m ago
Scores 71.6% on Aider Benchmark
rmoriz · 3h ago
the model was released literally one hour ago so we need to be a little bit more patient.
I'm gonna stick with v3-0324 and I recommend that others do the same.