OpenAI says its GPT-5.5 Instant model now answers health questions about as well as the company's frontier Thinking models, and that in its own evaluations the model's answers were rated higher than answers written by physicians on accuracy, communication, and completeness. The update is rolling out to all ChatGPT users, including the free tier, which is the part that makes it notable: GPT-5.5 Instant is the fast, default model most people get, not one of the slower reasoning models reserved for paying users.

The reach is the reason it matters. OpenAI says more than 230 million people turn to ChatGPT with health and wellness questions every week, so improving the model that fields most of those questions by default touches far more people than a gain on a premium tier would. The company says the new version is better at recognizing when a situation may need urgent care, asking for relevant context before answering, explaining how confident or uncertain it is, and translating dense medical information into plain language. It also says incorrect health statements dropped by roughly 71 percent over two months of work.

On the measurement side, OpenAI points to an aggregate of health evaluations, including a benchmark it calls HealthBench Professional, on which it says GPT-5.5 Instant reaches a level comparable to its frontier reasoning models. The company also describes a global network of more than 260 physicians across 60 countries who help define and grade what a good health answer looks like, and it is this kind of expert review that produced the headline claim that the model outscored doctors' own written responses.

The caveats deserve equal billing. Every one of these results comes from OpenAI's own benchmarks and its own reviewers, with no independent or peer-reviewed validation released alongside them. Outscoring physician-written answers in a rated study measures the quality of a piece of text as judged by a panel; it is not the same as measuring what happens when a real person acts on the advice, and the exact comparison setup, including whether the model and the doctors answered identical prompts at the same level of detail, is not fully laid out. There is also the plain fact that this is the fast model rather than the deliberate one, now handling health questions for hundreds of millions of free users by default. OpenAI still says ChatGPT is not a substitute for professional care.

It lands at the end of a week full of medical-AI claims, from an unproven full-body scanner to a grounded, peer-reviewed result where OpenAI's o3 helped diagnose rare diseases at Boston Children's Hospital. This sits somewhere in between: a capability gain that is probably real and genuinely useful for the everyday questions people are already asking, wrapped in a marketing claim that a company should not get to settle about its own product. Better health answers for 230 million people a week is a real good. A vendor grading its own model above doctors is a claim to keep treating as a claim until someone outside the company checks it.