Clinical impact of an explainable radiomics model with amino acid PET imaging: application to the diagnosis of aggressive gliomas

Application of Explainable Machine Learning in Amino Acid PET Imaging for Glioma Diagnosis

Academic Background

Gliomas are among the most common malignant tumors of the central nervous system, with their diagnosis and treatment strategies typically relying on histopathological analysis. However, histopathology has limitations such as being highly invasive and time-consuming. In recent years, radiomics—a technology extracting large quantities of quantitative features from medical images—has emerged as a rapidly growing field. Combining machine learning (ML) algorithms with radiomics enables the effective capture of complex relationships within image features, offering new possibilities for glioma diagnosis and prognosis assessment. Nevertheless, despite the high efficiency of ML models in glioma prediction tasks, their application in clinical practice remains limited, primarily due to the lack of transparency in their decision-making processes and their difficulty in seamless integration into clinical workflows.

To address these challenges, explainable machine learning (XML) methods have emerged, aimed at providing explanations for model predictions. These methods help physicians understand the basis for model decisions, thereby building trust in the models. This study investigates whether an explainable radiomics model based on amino acid positron emission tomography (PET) imaging can improve nuclear medicine physicians’ assessment of glioma aggressiveness during diagnosis.

Paper Source

This study was collaboratively conducted by research teams from various French institutions, including Université de Lorraine and CHRU-Nancy. The primary authors include Shamimeh Ahrari, Timothée Zaragori, Adeline Zinsz, among others. The study was published in 2024 in the journal European Journal of Nuclear Medicine and Molecular Imaging, with DOI: 10.1007/s00259-024-07053-6.

Research Workflow

1. Research Design and Data Collection

This retrospective study included patients who underwent dynamic 6-[18F]fluoro-L-dopa (18F-FDOPA) PET imaging at the Nuclear Medicine Department of Nancy University Hospital from January 2013 to January 2023. All participating patients underwent conventional magnetic resonance imaging (MRI) within 30 days of PET scans, and histopathological diagnoses were obtained within 60 days through surgery or stereotactic biopsy. A total of 85 patients were included in the study, with 63 patients assigned to training, and 22 patients to the test set.

2. Image Acquisition and Preprocessing

All patients fasted for at least four hours prior to PET scans. Some patients received carbidopa one hour before scans to enhance tracer uptake in the brain. PET scans were conducted on two imaging systems—Siemens Biograph 6 True Point PET/CT and Philips Vereos PET/CT—with a scanning duration of 30 minutes. Static images were reconstructed based on data from the last 20 minutes of the acquisition, and dynamic images were divided into 30 frames of 1 minute each. Spatial resampling using linear interpolation was applied to static images to achieve isotropic voxels of 2×2×2 mm³.

3. Feature Extraction

From static and dynamic PET images, 208 radiomics features were extracted, including first-order statistical, morphological, and textural features. Feature extraction utilized the PyRadiomics software package and in-house tools. Additionally, conventional features such as metabolic tumor volume and tumor-to-background ratio (TBR) were also derived.

4. Model Training and Evaluation

Samples were divided into training (75%) and test sets (25%) using stratified random sampling. Zero-variance features were removed, and the remaining ones were normalized. Spearman correlation-based hierarchical clustering was used to identify and retain the most informative features. An ensemble classifier consisting of logistic regression, support vector machines, random forests, and gradient-boosted decision trees was then trained. Hyperparameters were optimized using 5-fold cross-validation.

5. Model Explanation

Three XML methods—Local Interpretable Model-agnostic Explanations (LIME), Anchor, and Shapley Additive Explanations (SHAP)—were utilized to generate explanations for each test sample. These explanations were visualized and provided to physicians to help them understand the basis of the model’s predictions.

6. Physician Evaluation

Eighteen nuclear medicine physicians from eight institutions participated in the evaluation process. This was conducted in two phases: in the first phase, physicians classified 22 test samples based solely on conventional MRI and PET data; in the second phase, they re-evaluated the same samples by incorporating predictions and explanations from the radiomics model. Physicians were assessed on diagnostic accuracy, inter-rater agreement, and confidence levels in their evaluations.

Research Results

1. Model Performance

On the test set, the radiomics model achieved an AUC of 0.718 and a diagnostic accuracy of 0.775. The physicians’ diagnostic accuracy significantly improved in the second phase (0.775 vs. 0.717, p = 0.007), with a 6% increase in sensitivity and a 12% increase in specificity.

2. Physician Evaluation Results

In the second phase, inter-rater agreement among physicians significantly improved, with Fleiss’s kappa increasing from 0.609 to 0.747. Additionally, physicians’ confidence levels in diagnoses were significantly enhanced (p < 0.001). Among the three explainability methods, Anchor and SHAP were effective in 75% and 72% of cases, respectively, substantially outperforming LIME (p ≤ 0.001).

3. Impact of Model Explanations

When the model’s predictions were correct, physicians’ diagnostic accuracy significantly improved. However, when the model predictions were incorrect, physicians’ accuracy was slightly impacted. The explanations provided by the model also helped physicians better interpret imaging data, bolstering their confidence in making diagnoses.

Conclusions and Significance

The study demonstrated the potential value of an explainable radiomics model employing amino acid PET imaging for assessing glioma aggressiveness. By offering model explanations, this approach enhanced diagnostic accuracy and confidence levels among physicians while improving inter-rater consistency. These findings showcase a new pathway for integrating machine learning models into clinical practice, particularly in neuro-oncology. Future studies should explore the application of such models in other neuro-oncological tasks, such as glioma recurrence detection, to promote the widespread adoption of learning algorithms in healthcare.

Research Highlights

  1. Innovative Methods: For the first time, this study integrated explainable machine learning (LIME, Anchor, SHAP) with radiomics models for glioma diagnosis support.
  2. Clinical Practicality: The two-phase physician evaluation validated the model’s effectiveness in real clinical scenarios, providing valuable insights for ML model translation to practice.
  3. Multi-Center Collaboration: Conducted by research teams across several medical institutions in France, ensuring broad applicability of the findings.
  4. Diverse Data: Data from different imaging systems enhanced model robustness and generalizability.

Additional Insights

The study highlighted that while the model enhanced diagnostic accuracy in most cases, its predictions might be biased in rare cases like pleomorphic xanthoastrocytomas. Additionally, the authors emphasized the importance of physician familiarity with machine learning models, suggesting future exploration of how experience with ML affects decision-making.