Microsoft has revealed details of an artificial intelligence system that performs better than human doctors at complex health diagnoses, creating a “path to medical superintelligence”.
The company’s AI unit developed a system that imitates a panel of expert physicians tackling “diagnostically complex and intellectually demanding” cases.
Microsoft said that when paired with OpenAI’s advanced o3 AI model, its approach “solved” more than eight of 10 case studies specially chosen for the diagnostic challenge. When those case studies were tried on practising physicians – who had no access to colleagues, textbooks or chatbots – the accuracy rate was two out of 10.
Microsoft said it was also a cheaper option than using human doctors because it was more efficient at ordering tests.
Despite highlighting the potential cost savings from its research, Microsoft played down the job implications, saying it believed AI would complement doctors’ roles rather than replace them.
“Their clinical roles are much broader than simply making a diagnosis. They need to navigate ambiguity and build trust with patients and their families in a way that AI isn’t set up to do,” the company wrote in a blogpost announcing the research, which is being submitted for peer review.
However, using the slogan “path to medical superintelligence” raises the prospect of radical change in the healthcare market. While artificial general intelligence (AGI) refers to systems that match human cognitive abilities at any given task, superintelligence is an equally theoretical term referring to a system that exceeds human intellectual performance across the board.
Explaining the rationale behind the research, Microsoft raised doubt over AI’s ability to score exceptionally well in the United States Medical Licensing Examination, a key test for obtaining a medical licence in the US. It said the multiple-choice tests favoured memorising answers over deep understanding of a subject, which could help “overstate” the competence of an AI model.
Microsoft acknowledged its work is not ready for clinical use. Further testing is needed on its “orchestrator” to assess its performance on more common symptoms, for instance.