MediVision: Empowering Colorectal Cancer Diagnosis and Tumor Localization through Supervised Learning Classifications and Grad-CAM Visualization of Medical Colonoscopy Images

Academic Background

Colorectal Cancer (CRC) is one of the most common cancers worldwide, particularly among individuals over the age of 50. Early detection and accurate diagnosis are crucial for improving patient survival rates. However, traditional CRC screening methods, such as colonoscopy, rely heavily on the experience and visual judgment of physicians, which introduces subjectivity and the risk of misdiagnosis. In recent years, the application of Artificial Intelligence (AI) and Deep Learning (DL) technologies in medical image analysis has provided new possibilities for the automated diagnosis of CRC. However, existing AI models still face challenges in feature extraction and model interpretability, especially when handling images under varying imaging conditions, where the generalization ability and transparency of the models need significant improvement.

To address these issues, researchers developed the Medivision system, which combines Convolutional Neural Networks (CNNs), Gray-Level Co-occurrence Matrix (GLCM) feature extraction, and Gradient-weighted Class Activation Mapping (Grad-CAM) visualization techniques to enhance the accuracy and interpretability of CRC detection.

Source of the Paper

This study was conducted by Akella S. Narasimha Raju, K. Venkatesh, Ranjith Kumar Gatla, Shaik Jakeer Hussain, and Subba Rao Polamuri, affiliated with various research institutions. The paper was published in 2025 in the journal Cognitive Computation, titled Medivision: Empowering Colorectal Cancer Diagnosis and Tumor Localization through Supervised Learning Classifications and Grad-CAM Visualization of Medical Colonoscopy Images.

Research Process

1. Data Preprocessing and Augmentation

The study began with preprocessing and augmentation of three colonoscopy image datasets: CVC Clinic DB, Kvasir2, and Hyper Kvasir. Preprocessing steps included resizing images to 224×224 pixels, pixel normalization, and Gaussian filtering for noise reduction. Data augmentation techniques, such as random rotation, flipping, scaling, and cropping, were applied to increase dataset diversity and model generalization.

2. Feature Extraction

The study employed GLCM to extract texture features from the preprocessed images. GLCM calculated the spatial relationships between pixel pairs to extract six key features: Dissimilarity, Correlation, Homogeneity, Contrast, Angular Second Moment (ASM), and Energy. These features were used to capture subtle texture variations in colorectal polyps and cancerous tissues.

3. Model Training and Evaluation

The study evaluated seven pre-trained CNN architectures (ResNet50, VGG16, VGG19, DenseNet201, EfficientNetB7, NASNetLarge, and InceptionResNetV2) and two integrated CNN models (Dev-22 and RV-22). Dev-22 combined DenseNet201, EfficientNetB7, and VGG16, while RV-22 combined ResNet50 and VGG19. Each model was trained and tested on the three datasets, with evaluation metrics including training accuracy, testing accuracy, F1 score, recall, and precision.

4. Grad-CAM Visualization

To enhance model interpretability, the study used Grad-CAM to generate heatmaps highlighting the most important regions in the images for model predictions. Grad-CAM calculated gradients from the feature maps of convolutional layers to produce class activation maps, helping clinicians understand the model’s decision-making process.

Key Results

1. Model Performance

Among all evaluated CNN architectures, VGG16 performed exceptionally well across the three datasets. On the CVC Clinic DB dataset, VGG16 achieved a testing accuracy of 96.12%; on the Kvasir2 dataset, it reached 94.25%; and on the Hyper Kvasir dataset, it achieved 98.87%. The integrated model Dev-22 also demonstrated high accuracy across multiple datasets, particularly on the CVC Clinic DB dataset, where it achieved a testing accuracy of 97.86%.

2. Grad-CAM Visualization

Grad-CAM heatmaps successfully localized polyp regions in colonoscopy images, providing intuitive visual explanations. The Grad-CAM images generated by VGG16 and Dev-22 showed high localization accuracy, aiding clinicians in better understanding the model’s predictions.

Conclusions and Significance

The Medivision system significantly improved the accuracy and interpretability of CRC detection by combining CNNs, GLCM, and Grad-CAM techniques. The successful application of this system provides clinicians with an efficient and reliable diagnostic tool, particularly when handling complex and variable colonoscopy images, demonstrating strong generalization ability and transparency.

Research Highlights

  1. High Accuracy: VGG16 and Dev-22 achieved high detection accuracy across multiple datasets, with testing accuracy nearing 98% on the CVC Clinic DB dataset.
  2. Model Interpretability: Grad-CAM enhanced model transparency, helping clinicians understand the decision-making process and increasing trust in clinical applications.
  3. Integrated Models: The integrated designs of Dev-22 and RV-22 leveraged the strengths of different CNN architectures, further boosting model performance.

Additional Valuable Information

The study also explored the impact of different batch sizes and image dimensions on model performance, finding that smaller batch sizes (e.g., 16) improved model responsiveness, while larger batch sizes (e.g., 64) accelerated training convergence. Additionally, the study utilized the Google Colab Pro+ platform and NVIDIA Tesla T4 GPU for model training, ensuring computational efficiency and scalability.

Through this research, the Medivision system has established itself as a powerful tool for the early detection and diagnosis of CRC, with promising potential for widespread clinical application in the future.