Researchers at Boston Children's Hospital and Harvard, working with OpenAI, report in the peer-reviewed journal NEJM AI that the company's o3 model helped clarify 18 new diagnoses for children whose rare diseases had gone unsolved. Unlike most of this week's medical-AI announcements, this one arrives with real patients, real diagnoses, and peer review behind it.

The team ran o3 over several hundred genomes from patients who had spent years without an answer, using it as what the hospital calls a co-pilot geneticist: a system that pulls together genetic data, the patient's clinical phenotype, and the global medical literature to surface candidate explanations that a human geneticist then evaluates. Across that set, it produced new diagnoses for close to 5 percent of cases, 18 in the study, and the hospital says its broader co-pilot effort has now contributed to more than 40 diagnoses that were once thought impossible.

For rare-disease families, that number is not abstract. A diagnostic odyssey can run for years, full of repeated tests, dead ends, and no name for what is wrong, and a single correct answer can change treatment, end the searching, and connect a family to others with the same condition. The promise here is not that AI replaces the geneticist, but that it can read more of the literature and cross-reference more of the genome than a person can in the time available, then hand a shortlist back to a clinician to confirm.

The caveats are the honest ones. A 5 percent rate means the great majority of hard cases still go unsolved, the model surfaces candidates rather than confirming them, and a human expert remains in the loop for every call. But it stands apart from the week's flashier claims: where an image company announced an unproven full-body scanner and a new benchmark showed the best models clearing only about a third of expert science tasks, this is a smaller, grounded result with peer review and actual children behind it. Real, modest, and checked is its own kind of headline.