Microsoft launched Copilot Health this month, letting users connect medical records and ask health questions directly through its chatbot. Days earlier, Amazon expanded Health AI beyond its One Medical subscribers to general availability. These join OpenAI's ChatGPT Health and Anthropic's Claude in a sudden rush to deploy consumer health AI, driven by massive demand—Microsoft alone fields 50 million health questions daily through Copilot.
The timing isn't coincidental. These companies argue LLMs have crossed a capability threshold where they can safely provide medical advice. Microsoft's Dominic King, a former surgeon leading their health AI efforts, points to "enormous progress in the capabilities of generative AI to answer health questions." But this self-assessment from the companies building these products raises red flags about oversight in healthcare applications.
Researchers are pushing back on the lack of independent evaluation. While some studies suggest current LLMs can make useful health recommendations, experts argue these tools need rigorous third-party testing before wide release—not just internal company research. "The evidence base really needs to be there," says Oxford's Andrew Bean, highlighting the risk of companies having blind spots when evaluating their own high-stakes products.
For developers and AI users, this wave represents both opportunity and caution. The demand is clearly massive, and the technical capability may finally exist. But building or deploying health AI without independent safety validation could expose users to serious risks that company-led evaluations might miss.
