A study by researchers at Beth Israel Deaconess Medical Center found that when test results were positive, the AI-enabled chatbot’s diagnostic accuracy was comparable to that of human clinicians in most cases.
A recent study conducted by physician-investigators at Beth Israel Deaconess Medical Center (BIDMC) has compared the probabilistic reasoning abilities of a chatbot to those of human clinicians. The findings, published in JAMA Network Open, suggest that artificial intelligence (AI) has the potential to serve as a valuable clinical decision-support tool for physicians.
AI’s Role in Probabilistic Reasoning
The study’s author, Dr. Adam Rodman, emphasized the challenge humans face in probabilistic reasoning, which involves making decisions based on calculating odds. Probabilistic reasoning is a crucial component of the complex process of diagnosis. Dr. Rodman’s team chose to evaluate this aspect in isolation because it represents an area where humans could benefit from support.
The study was based on a national survey involving more than 550 practitioners who performed probabilistic reasoning on five medical cases. The researchers then used the publicly available Large Language Model (LLM) Chat GPT-4 to analyze the same cases. The chatbot estimated the likelihood of specific diagnoses based on patient presentations and updated its estimates when test results were introduced.
The study revealed that when test results were positive, the chatbot’s diagnostic accuracy was comparable to that of human clinicians in most cases. However, when test results were negative, the chatbot consistently outperformed human clinicians in all five cases. This highlights the chatbot’s ability to maintain a more accurate assessment after receiving negative test results.
Chatbot Impact on Clinical Decision-Making
Dr. Rodman is interested in how the availability of AI support tools like chatbots might influence the performance of highly skilled physicians in clinical settings. While acknowledging that LLMs do not calculate probabilities the same way as experts, he believes that their integration into clinical workflows could lead to improved decision-making by human clinicians.
Co-authors of the study included experts from the University of Massachusetts Amherst, Harvard Medical School, and the University of Maryland School of Medicine. Grants from organizations such as the Gordon and Betty Moore Foundation and the Department of Veterans Affairs, among others, supported the research.