How Microsoft’s AI Sets New Standards for Medical Diagnosis

One of AI’s most celebrated contributions to society is in healthcare – and now Microsoft is making an entrance into this field.
The company says it has now developed an AI system that diagnoses complex medical conditions with accuracy rates four times higher than experienced physicians.
The Microsoft AI Diagnostic Orchestrator (MAI-DxO) correctly identified 85% of challenging diagnostic cases published in the New England Journal of Medicine, a peer-reviewed medical publication.
In comparison, 21 practicing physicians from the US and UK achieved a mean accuracy of 20% on the same cases.
The research is the first initiative from an AI health unit formed last year by Mustafa Suleyman, CEO of Microsoft AI, with staff recruited from DeepMind, the Google-owned research laboratory he co-founded.
He describes the trial as a step toward “medical superintelligence” in an interview with the Financial Times, that could help solve staffing crises and long waiting times for overstretched health systems.
“We are nearing AI models that are not just a little bit better, but dramatically better, than human performance: faster, cheaper and four times more accurate,” he says.
“That is going to be truly transformative.”
Sequential diagnosis challenges traditional AI benchmarks
The Microsoft team addressed limitations in current AI medical assessments, which typically rely on multiple-choice questions from examinations.
These standardised tests, required for physicians to practise in America, primarily measure memorisation rather than clinical reasoning.
Whereas the MAI-DxO operates through sequential diagnosis, mimicking real-world medical decision-making processes.
The system creates virtual panels of five AI agents acting as doctors, each with distinct roles such as generating hypotheses or selecting diagnostic tests.
These agents interact and debate together to choose treatment courses.
The system uses a technique called ’chain of debate’, which requires AI reasoning models to provide step-by-step accounts of their problem-solving processes.
This approach allows researchers to understand how the system reaches diagnostic conclusions.
The AI models were also prompted to be cost-conscious, significantly reducing the number of tests required for accurate diagnosis and saving hundreds of thousands of dollars in some cases.
MAI-DxO orchestrates multiple AI models for medical decisions
The diagnostic orchestrator integrates multiple large language models (LLM) including GPT from OpenAI, models from Meta, Claude, Gemini, systems from Elon Musk’s xAI and DeepSeek.
This approach emulates diverse medical expertise collaborating on complex cases.
The best-performing configuration paired MAI-DxO with OpenAI’s o3 model, the AI company’s reasoning-focused system.
Microsoft has invested almost US$14bn in OpenAI and holds exclusive rights to use and sell its technology.
Mustafa emphasises Microsoft’s technology-agnostic approach despite OpenAI’s superior performance.
“We have long believed that they’ll become commodities,” he says.
“It’s the aggregate orchestrator which I think is the differentiator.”
The participating physicians, each with five to 20 years of clinical experience, worked without access to colleagues, textbooks or AI assistance to enable fair comparison with the system’s performance.
Dominic King, former Head of DeepMind’s Health Unit who joined Microsoft late last year, says the programme has “performed better than anything we’ve ever seen before” and that “there is an opportunity here today to act almost as a new front door to healthcare”.
Healthcare costs drive AI adoption despite limitations
A version of the technology could soon be deployed in Microsoft’s Copilot AI chatbot and Bing search engine, which handle 50 million health queries daily.
Microsoft’s research suggests AI diagnostic tools could reduce unnecessary healthcare expenditures whilst improving accuracy – as the US health spending is approaching 20% of GDP, with an estimated 25% providing minimal impact on patient outcomes.
“Important challenges remain before Gen AI can be safely and responsibly deployed across healthcare,” Microsoft’s research team says.
“We need evidence drawn from real clinical environments, alongside appropriate governance and regulatory frameworks to ensure reliability, safety and efficacy.”
Explore the latest edition of AI Magazine and be part of the conversation at our global conference series, Tech & AI LIVE.
Discover all our upcoming events and secure your tickets today.
Also sign up to our free weekly newsletter for the latest insights and stories straight into your inbox.
AI Magazine is a BizClik brand


