Vision Transformers, Ensemble Model, and Transfer Learning Leveraging Explainable AI for Brain Tumor Detection and Classification

In recent years, due to the high incidence and lethality of brain tumors, rapid and accurate detection and classification of brain tumors have become particularly important. Brain tumors include both malignant and non-malignant types, and their abnormal growth can cause long-term damage to the brain. Magnetic Resonance Imaging (MRI) is a commonly used method for detecting brain tumors. However, relying on experts to manually analyze MRI images poses a risk of inconsistent results, and merely identifying tumors is insufficient; it is equally important to quickly determine the type of tumor to commence treatment as early as possible.

Background of the Paper

To improve the speed, reliability, and fairness of tumor detection, this study explores various Deep Learning (DL) architectures, including VGG16, InceptionV3, VGG19, ResNet50, InceptionResNetV2, and Xception, and proposes a new model IVX16 based on the best three Transfer Learning (TL) models. The multi-class classification model proposed in this paper aims to address the current focus on binary classification issues, providing more accurate multi-class classification results.

Source of the Paper

This research was conducted by Shahriar Hossain, Amitabha Chakrabarty, Thippa Reddy Gadekallu, Mamoun Alazab, and Md. Jalil Piran, among other authors from different institutions including BRAC University, Vellore Institute of Technology, Charles Darwin University, and Sejong University. The paper was published in the IEEE Journal of Biomedical and Health Informatics in March 2024.

Methods and Processes

Dataset

The study used a multi-class brain tumor dataset comprising four categories (pituitary tumors, gliomas, meningiomas, and no tumor), totaling 3264 images. The dataset was divided into training, testing, and validation sets in a ratio of 80%, 10%, and 10%, respectively. Through data augmentation techniques (such as rescaling, shearing, zooming, and horizontal flipping), the dataset was enhanced to 13056 images.

TL Models and Architectures

Six TL models were used in the study: 1. VGG16: With 16 layers, capable of effectively handling complex functions and performing excellently in clinical analysis and other practical fields. 2. InceptionV3: A model by Google, with relatively low computational cost, suitable for large datasets. 3. VGG19: With 19 layers, having one more convolutional layer than VGG16, suitable for complex datasets. 4. ResNet50: A residual network, suitable for dealing with deeper networks, reducing degradation problems. 5. InceptionResNetV2: Combining Inception models with residual connections, significantly speeding up training. 6. Xception: Using depthwise separable convolutions, saving computational resources and effectively utilizing model parameters.

IVX16 Integration Model

IVX16 is based on the three best models VGG16, InceptionV3, and Xception. The integration of models enhances robustness to data variations and model architecture changes. The integration model avoids overfitting issues, with increased complexity allowing it to handle more complex image patterns.

Vision Transformer (ViT) Models

Additionally, three Transformer-based visual models (Swin, CCT, and EANet) were compared: 1. Swin: A Shifted Windows type Transformer, using a sliding window mechanism to improve processing efficiency and hierarchical representation capabilities. 2. CCT: Combining convolution and Transformer methods, enhancing local information processing capabilities. 3. EANet: Using an external attention model to improve computational efficiency and performance.

Experimental Results

Results of TL and Integration Models

IVX16 performed optimally during training and validation, achieving a peak accuracy of 96.94%. Other TL models like InceptionV3, VGG16, and Xception also demonstrated good performance, but IVX16 performed better in tumor detection and classification.

Results of ViT Models

ViT models, due to their requirement for large amounts of data, performed poorly on smaller datasets, showing evident overfitting phenomena.

Explainable AI

The performance of TL models and the IVX16 model was evaluated using the LIME tool, with generated images demonstrating each model’s accuracy in tumor region classification. IVX16 showed better accuracy in recognizing three different tumor types, while single TL models were prone to classification errors.

Conclusion

The IVX16 model proposed in this paper improves the accuracy and reliability of brain tumor classification through the integration of TL models. Compared to single models and ViT models, IVX16 demonstrated stronger detection and classification capabilities, particularly in the accurate and credible interpretation of predicted results. Future research will explore more methods to enhance model performance and explainability, aiming to further improve the accuracy and practicality of brain tumor detection.