ABVS Breast Tumour Segmentation via Integrating CNN with Dilated Sampling Self-Attention and Feature Interaction Transformer

ABVS Breast Tumor Segmentation Research Based on CNN and Dilated Sampling Self-Attention

Academic Background

Breast cancer is the second most common cancer worldwide, and early and accurate detection is crucial for improving patient prognosis and reducing mortality. Although various imaging techniques (such as X-ray mammography, magnetic resonance imaging, and handheld ultrasound) are currently used for early breast cancer screening, these technologies often face challenges such as limited resolution or strong operator dependency. To address these issues, the Automated Breast Volume Scanner (ABVS) was developed. ABVS can automatically acquire a comprehensive view of the entire breast, but its image analysis remains challenging due to significant variations in the size, shape, and location of breast tumors. In recent years, deep learning has made significant progress in medical image analysis, particularly with convolutional neural networks (CNN) and transformers excelling in tumor segmentation and detection tasks. However, existing CNN methods have limitations in capturing global contextual information, while pure transformer architectures are computationally expensive when processing large 3D medical images. Therefore, effectively combining the strengths of CNN and transformers has become an important research direction.

Paper Source

This paper was co-authored by Yiyao Liu, Jinyao Li, Yi Yang, and others, affiliated with the School of Biomedical Engineering, Health Science Center, Shenzhen University and the Department of Ultrasonics, Union Shenzhen Hospital, Huazhong University of Science and Technology. The paper was published in 2025 in the journal Neural Networks, titled “ABVS Breast Tumour Segmentation via Integrating CNN with Dilated Sampling Self-Attention and Feature Interaction Transformer.”

Research Process

1. Research Design and Network Architecture

This study proposes a novel 3D segmentation network—DST-C—which combines a convolutional neural network (CNN) with a Dilated Sampling Self-Attention Transformer (DST). The core idea of the network is to extract local detail information through the CNN branch and capture global features through the DST branch, thereby achieving more accurate tumor segmentation. Specifically, the network consists of the following components:

  • CNN Branch: Uses a Residual Connection Network to extract local detail features from the image.
  • DST Branch: Based on the Swin Transformer (ST), it introduces a dilated sampling self-attention mechanism to expand the receptive field and reduce computational complexity.
  • Spatial-Channel Attention Bridge (SCA): Connects the CNN and DST branches, fusing local and global features through spatial and channel attention mechanisms.
  • Decoder: Combines features from both branches and restores image resolution through upsampling operations.

2. Self-Supervised Learning Strategy

To address the scarcity of annotated medical image data, this study proposes a self-supervised learning (SSL) strategy based on Mask Image Modelling (MIM). The specific steps are as follows:

  • Mask Generation: Randomly applies cubic masks to the input image, with mask size and ratio determined through experiments.
  • Feature Extraction: The CNN branch processes unmasked images, while the DST branch processes masked images and extracts features.
  • Feature Reconstruction: Uses a simple decoder to reconstruct masked regions and calculates L1 loss at both feature and pixel levels to optimize the network.

3. Postprocessing Algorithm

To improve tumor detection sensitivity and reduce the false-positive rate, this study designs an adaptive threshold local range region-growing algorithm. This algorithm dynamically adjusts the segmentation threshold by comparing global and local maxima, thereby more accurately identifying tumor regions.

4. Experiments and Evaluation

Experiments were conducted on three datasets: a self-collected ABVS dataset, the publicly available KITS19 CT dataset, and the TDSC-ABUS 2023 3D breast ultrasound dataset. The results show that the DST-C network achieved a segmentation Dice coefficient of 73.65% and a sensitivity of 91.67% on the ABVS dataset, significantly outperforming other comparative methods. On the KITS19 dataset, DST-C achieved a Dice coefficient of 98.03% for kidney segmentation and 87.24% for kidney tumor segmentation, also demonstrating excellent performance.

Main Results

  1. Effectiveness of Network Architecture: Experiments show that the DST-C network excels in fusing local details and global contextual information. Compared to single CNN or ST branches, the dual-branch structure significantly improves segmentation accuracy.
  2. Contribution of Self-Supervised Learning: Through the SSL strategy, the network’s segmentation performance significantly improves after pre-training on unlabeled data. The optimal mask size is 4, with a mask ratio of 40%.
  3. Optimization of Postprocessing Algorithm: The adaptive threshold region-growing algorithm effectively reduces the false-positive rate while maintaining high sensitivity.
  4. Multi-Dataset Validation: DST-C performs well on the ABVS, KITS19, and TDSC-ABUS datasets, demonstrating its generalization ability.

Conclusion and Significance

The DST-C network proposed in this study successfully addresses the challenges of breast tumor segmentation in ABVS images by combining the strengths of CNN and DST. Its innovations include: - Dual-Branch Structure: Effectively fuses local details and global contextual information. - Dilated Sampling Self-Attention Mechanism: Expands the transformer’s receptive field and reduces computational complexity. - Self-Supervised Learning Strategy: Addresses the scarcity of annotated medical image data. - Adaptive Postprocessing Algorithm: Improves the accuracy and sensitivity of tumor detection.

This research not only provides a new solution for automated breast tumor segmentation but also offers valuable insights for other medical image segmentation tasks.

Research Highlights

  1. Innovative Network Architecture: The DST-C network is the first to combine CNN with a dilated sampling self-attention transformer, achieving effective fusion of local and global features.
  2. Application of Self-Supervised Learning: Utilizes mask image modeling to fully leverage unlabeled data, enhancing model performance.
  3. Multi-Dataset Validation: Validates the model’s generalization ability on multiple public and private datasets.
  4. Optimization of Postprocessing Algorithm: The adaptive threshold region-growing algorithm significantly improves tumor detection accuracy.

Other Valuable Information

This study also explores the impact of different mask sizes and ratios on the effectiveness of self-supervised learning, providing experimental evidence for future related research. Additionally, the research team has made the code publicly available, facilitating replication and improvement by other researchers.

This study offers new ideas and methods for automated breast tumor segmentation, holding significant scientific and clinical value.