The Path to Medical Superintelligence

8 brandonb 7 6/30/2025, 2:30:29 PM microsoft.ai ↗

Comments (7)

gm678 · 8h ago
> Microsoft AI Diagnostic Orchestrator (MAI-DxO) correctly diagnoses up to 85% of NEJM case proceedings, a rate more than four times higher than a group of experienced physicians.

> Clinicians in our study worked without access to colleagues, textbooks, or even generative AI, which may feature in their normal clinical practice.

1. As I understand, it's very common for doctors to fall back on reference material in their practice, especially for the most complex cases. If all access to resources was cut off (as seems to be implied by the second quote), the comparison seems somewhat unfair.

2. What were the publication dates of the case records? I can't find this information, and it makes a difference if the NEJM case studies were in the LLMs' training data.

miraculixx · 6h ago
Exactly. The study has been set up to produce this exact result. They essentially limited the human doctors to bare essentials, on specialist cases(!), while providing the LLMs with all sorts of help, including discussion among several AIs.

That's like letting one group of students have a strict closed-book exam, while another group can take the test as a group exercise and accessing any material they like, then claiming that closed-book exams lead to worse outcomes.

In a nutshell the study is just slop designed to get attention. The headline result is what they really want people to hear, and that's all the media will be repeating.

PaulHoule · 9h ago
I was doing a comparative analysis of the acquistion strategies of various "big tech" firms and was a little startled that I missed Microsoft's 2022 acquistion of Nuance, largely for its speech recognition systems aimed at the medical sector:

https://news.microsoft.com/source/2022/03/04/microsoft-compl...

miraculixx · 6h ago
As any AI researcher knows, if you have a model that does 4x better than the naive baseline (the humans, in this case), you are likely looking at overfit, not real-life performance. This study is just slop, and you can tell so by the mere fact that they did not submit a paper, but just published a PR article.
brandonb · 5h ago
In the paper, they say they used the most recent 56 cases (from 2024–2025) as a holdout set. The majority of those cases happened after the o4 training cutoff of May 31, 2024.
miraculixx · 4h ago
Are these 56 cases distinct from all other cases in the data?
LargoLasskhyfv · 6h ago
They didn't? What am I looking at, then?

https://arxiv.org/abs/2506.22405

This appears when you click on 'View Publication' in the article near the end, right before Q&A.