AI Explanation Type Affects Physician Diagnostic Performance and Trust in AI
The Impact of AI Explanation Types on Physician Diagnostic Performance and Trust
Academic Background
In recent years, the development of artificial intelligence (AI) diagnostic systems in healthcare and radiology has progressed rapidly, particularly in assisting overburdened healthcare providers, showcasing the potential to improve patient care. As of 2022, the U.S. Food and Drug Administration (FDA) has approved 190 radiology AI software programs, with the approval rate increasing annually. However, a significant gap remains between proof of concept and the actual integration of AI into clinical practice. To bridge this gap, fostering appropriate trust in AI advice is crucial. While highly accurate AI systems have demonstrated their ability to enhance physician diagnostic performance and patient outcomes in real-world settings, incorrect AI advice can reduce diagnostic performance, which has understandably delayed the translational implementation of AI.
Clinicians have called for AI tools to be transparent and interpretable. In the field of medical imaging, AI tools can provide two broad categories of explanations: local explanations and global explanations. Local explanations justify specific predictions based on particular inputs (e.g., highlighting informative image features on a radiograph), while global explanations describe the overall functioning of the AI tool (e.g., explaining that the AI tool’s decision criteria are based on comparisons to prototypical images of each diagnostic class). Additionally, clinicians often value knowing the confidence or uncertainty of AI outputs to determine whether to use AI advice. However, there is a disagreement between clinicians and AI developers regarding the usefulness of these two main types of AI explanations in healthcare applications, particularly in radiology diagnostics.
Research Purpose and Background
This study aimed to test whether the type of AI explanation, the correctness of AI advice, and the confidence level of AI advice impact physicians’ diagnostic performance, perception of AI advice, and trust in AI advice in chest radiograph diagnosis. The hypothesis was that different types of AI explanations, the correctness of AI advice, and the confidence level would affect physicians’ diagnostic accuracy, efficiency, diagnostic confidence, and perception of AI advice.
Source of the Paper
This paper was co-authored by Drew Prinster, Amama Mahmood, Suchi Saria, Jean Jeudy, Cheng Ting Lin, Paul H. Yi, and Chien-Ming Huang, affiliated with the Department of Computer Science at Johns Hopkins University, Bayesian Health, the Department of Diagnostic Radiology at the University of Maryland School of Medicine, St. Jude Children’s Research Hospital, and the Department of Radiology at Johns Hopkins University School of Medicine. The paper was published in November 2024 in the journal Radiology and was supported by the National Science Foundation.
Research Methods and Process
Study Design
This was a multicenter, prospective randomized study conducted from April 2022 to September 2022. The study employed two types of AI explanations prevalent in medical imaging: local explanations (feature-based) and global explanations (prototype-based). The correctness and confidence level of AI advice were within-participant factors, while the type of AI explanation was a between-participant factor. Participants included radiologists (task experts) and internal or emergency medicine physicians (task non-experts), who were required to read chest radiographs and receive simulated AI advice. Generalized linear mixed-effects models were used to analyze the effects of experimental variables on diagnostic accuracy, efficiency, physicians’ perception of AI advice, and “simple trust” (i.e., the speed of alignment with or divergence from AI advice).
Study Participants and Data Collection
The study recruited 220 physicians (median age 30 years, 146 males), including 132 radiologists and 88 internal or emergency medicine physicians. Each physician was required to read eight chest radiograph cases and receive simulated AI advice. The correctness and confidence level of AI advice were randomly varied across the cases, with each participant randomly assigned six cases with correct advice and two cases with incorrect advice. The type of AI explanation was randomly assigned between participants, with local explanations presented as annotated bounding boxes highlighting abnormal regions on the radiograph, while global explanations were presented as a visual comparison between the case image and a prototypical image from the AI training dataset.
Data Analysis
Generalized linear mixed-effects models were used to analyze the data, with control variables including physicians’ knowledge of AI, demographic characteristics, and task expertise. Holm-Sidak corrections were applied to adjust for multiple comparisons.
Results
Diagnostic Accuracy
The results showed that when AI advice was correct, local explanations significantly improved physicians’ diagnostic accuracy (β = 0.86, p < 0.001), while global explanations performed worse. When AI advice was incorrect, the type of explanation did not significantly affect diagnostic accuracy (β = -0.23, p = 0.39). Additionally, there was an interaction effect between AI confidence level and physician task expertise, with task non-experts benefiting more from local explanations when AI confidence was high, while task experts benefited more from local explanations when AI confidence was low.
Diagnostic Efficiency
Local explanations significantly reduced the time physicians spent considering AI advice (β = -0.19, p = 0.01), indicating that local explanations improved diagnostic efficiency. The correctness of AI advice did not significantly affect diagnostic efficiency (β = -0.06, p = 0.17).
Physicians’ Perception of AI Advice
The type of AI explanation and the confidence level of AI advice did not significantly affect physicians’ perception of AI advice (β = 0.35, p = 0.07; β = -0.16, p = 0.22). However, there was an interaction effect between physician task expertise and the correctness of AI advice, with task experts perceiving a greater difference between correct and incorrect AI advice (β = 0.84, p < 0.001).
Simple Trust Mechanism
Local explanations significantly increased physicians’ “simple trust” in AI advice (β = 1.32, p = 0.048), meaning that physicians aligned more quickly with AI advice. This mechanism helped improve diagnostic accuracy when AI advice was correct but could lead to overreliance when AI advice was incorrect.
Conclusion
This study demonstrated that the type of AI explanation significantly impacted physicians’ diagnostic performance and trust in AI, even when physicians themselves were not aware of these effects. Local explanations improved diagnostic accuracy and efficiency when AI advice was correct but could also increase overreliance on incorrect advice. Future development of AI decision support systems should carefully consider the impact of different explanation types, particularly in terms of AI uncertainty and user experience levels.
Research Highlights
- Advantages of Local Explanations: Local explanations significantly improved physicians’ diagnostic accuracy and efficiency when AI advice was correct.
- Simple Trust Mechanism: Local explanations increased physicians’ “simple trust” in AI advice, which could help reduce underreliance on correct advice but might also increase overreliance on incorrect advice.
- Interaction with Task Expertise: Task non-experts benefited more from local explanations when AI confidence was high, while task experts benefited more from local explanations when AI confidence was low.
Significance and Value of the Study
This study provides important insights into the application of AI in radiology diagnostics, highlighting the critical role of AI explanation types in physician-AI collaboration. The findings suggest that the design of AI systems should carefully consider explanation types, AI confidence levels, and user experience to optimize the clinical application of AI. Future research could further explore other explanation types and representations of AI uncertainty to enhance the transparency and interpretability of AI in medical decision-making.