Multimodal Disentangled Variational Autoencoder with Game Theoretic Interpretability for Glioma Grading

Application of Multi-modal Disentangled Variational Autoencoder and Game Theory Interpretability in Glioma Grading

Background

Gliomas are the most common primary brain tumors in the central nervous system. According to cellular activity and invasiveness, the World Health Organization (WHO) classifies them into grades I to IV, with grades I and II referred to as low-grade gliomas (LGG) and grades III and IV as high-grade gliomas (HGG). In clinical practice, treatment decisions often need to be personalized based on different tumor grades. Therefore, accurate glioma grading is crucial for treatment decisions, personalized therapy, and predicting patient prognosis. Currently, the gold standard for glioma grading still involves surgical biopsy or histopathological analysis. However, this method is invasive, not real-time, and may lead to complications such as seizures, infection, or even tumor migration along the puncture route. Thus, developing a grading system that can non-invasively and timely diagnose glioma grade preoperatively is of significant importance.

Magnetic resonance imaging (MRI) is widely used in the clinical preoperative diagnosis, treatment decision-making, and prognosis evaluation of glioma patients and has proven to be a promising non-invasive tool. Conventionally obtained MRI images for glioma treatment include four modalities: Fluid-attenuated inversion recovery (FLAIR), T1-weighted imaging (T1), T1-weighted contrast-enhanced imaging (T1CE), and T2-weighted imaging (T2). Each modality reflects different tissue signals. Specifically, FLAIR can show highly heterogeneous signals of tumor infiltration and edema; T1 provides anatomical signals; T1CE reflects signals in non-enhancing regions and necrotic tissue, while T2 is sensitive to edema areas and can provide signals for tumor boundaries and edema extent. Therefore, traditional multi-modal MRI images can clearly show the signal intensity and mass effect of hemorrhage, necrosis, and edema tissue in gliomas. Radiologists can make treatment decisions for patients based on this comprehensive information, but this is a cumbersome and inefficient task.

In recent years, some advanced parametric MRI techniques such as Diffusion Tensor Imaging (DTI) and Apparent Diffusion Coefficient (ADC) have provided significant fiber density indices, ADC histograms, and permeability indicators, showing strong potential in glioma grading. However, these advanced MRI technologies are time-consuming and expensive, making them unsuitable for widespread clinical application for every patient. Therefore, effectively integrating traditional multi-modal MRI images for accurate glioma grading is pressing.

Source Introduction

This research paper is authored by Jianhong Cheng, Min Gao, Jin Liu, Hailin Yue, Hulin Kuang, Jun Liu, and Jianxin Wang, affiliated with the Key Laboratory of Bioinformatics of Hunan Province, School of Computer Science and Engineering, Central South University, Guizhou Aerospace Measurement and Testing Technology Research Institute, the Department of Imaging at the Second Xiangya Hospital of Central South University, Hunan Imaging Quality Control Center, and other institutions. The paper was published in the IEEE Journal of Biomedical and Health Informatics in February 2022.

Detailed Research Process

a) Research Workflow

This study proposes a Multi-Modal Disentangled Variational Autoencoder (MMD-VAE) model used for glioma grading based on radiomics features extracted from preoperative multi-modal MRI images. The research involves multiple data processing and experimental steps, including:

  1. Data Collection and Processing: The research data came from the Multimodal Brain Tumor Segmentation Challenge (BraTS) and clinical data from the Second Xiangya Hospital of Central South University. A total of 1752 MRI images from 438 glioma patients were collected, including the FLAIR, T1, T1CE, and T2 modalities.

  2. ROI Definition and Segmentation: All preoperative MRI images were first repositioned to the Left-Posterior-Superior (LPS) coordinate system, registered to a unified T1 anatomical template, and skull-stripped using the Brain Extraction Tool (BET). Following the BraTS challenge, three ROIs were considered for segmentation: Non-Enhancing Tumor (NET), Enhancing Tumor (ET), and Edema (ED). This study primarily focuses on the NET region due to its strong heterogeneity and predictive performance.

  3. Radiomics Feature Definition and Extraction: ROI images were further processed through nine filters (including original, non-applicable filter, square, square root, logarithm, exponential, gradient, wavelet transform, local binary pattern, and Laplacian of Gaussian), from which 2153 quantitative features were extracted. The features were categorized into seven types: First-order statistical, shape, Gray-Level Co-occurrence Matrix (GLCM), Gray-Level Dependence Matrix (GLDM), Gray-Level Run Length Matrix (GLRLM), Gray-Level Size Zone Matrix (GLSZM), and Neighborhood Gray-Tone Difference Matrix (NGTDM).

  4. Multi-Modal Disentangled Variational Autoencoder: The MMD-VAE framework encodes the features of each modality through the encoder of the variational autoencoder to extract latent representations, which are then disentangled into shared representations and distinct representations. The encoder consists of multiple dense layers and ReLU activation functions, followed by two linear mappings to disentangle the latent representations. The decoder reconstructs the input features from the latent vectors, and the disentangled shared representations are also used for cross-modal reconstruction.

  5. Glioma Grading Predictor: The shared and distinct representations of each modality are combined and input into the glioma grading predictor, composed of two hidden layers and one output layer, which outputs the predicted probability through the sigmoid function.

b) Major Research Results

Experimental results indicate that the proposed MMD-VAE model demonstrates excellent predictive performance on two benchmark datasets. It achieved an AUC of 0.9939, an accuracy of 98.46%, sensitivity of 100%, and specificity of 94.12% on the public dataset. On the cross-institutional private dataset, it attained an AUC of 0.9611, accuracy of 94.32%, sensitivity of 96.72%, and specificity of 88.89%. These quantitative results and interpretations may help radiologists better understand gliomas and support improved clinical treatment decisions.

c) Research Conclusion and Significance

This study proposes a highly effective multi-modal disentangled variational autoencoder model capable of glioma grading based on radiomics features extracted from preoperative multi-modal MRI images. This approach not only improves grading prediction accuracy but also enhances model interpretability, which is significant for clinical diagnosis and personalized treatment decisions. Through the SHAP method, the model can quantitatively interpret and analyze the contribution of important features to grading, thereby enhancing radiologists’ understanding of gliomas and improving clinical prognosis.

d) Research Highlights

  1. Innovative Disentangled Representation Learning Method: By disentangling the multi-modal variational autoencoder model, complementary information between different modalities is extracted, improving the prediction accuracy of glioma grading.
  2. Quantitative Interpretability Model: The SHAP method is used to quantitatively interpret the contribution of important features to grading, enhancing the model’s interpretability and helping clinicians better understand and apply it.
  3. Outstanding Predictive Performance: The model shows extremely high AUC and accuracy on two benchmark datasets, validating its effectiveness and stability.
  4. Multi-modal Integration: Successfully integrates multi-modal MRI data, enhancing the model’s robustness and generalizability.

The method proposed in this paper holds significant application value in non-invasive glioma grading and provides new ideas and methods for future imaging analysis research.