PICK: Predict and Mask for Semi-Supervised Medical Image Segmentation
Report on the Paper “PICK: Predict and Mask for Semi-Supervised Medical Image Segmentation”
Academic Background
Accurate segmentation of medical images is crucial in clinical practice, as it provides vital insights into organ/tumor characteristics such as volume, location, and shape. Recent studies have highlighted the significant potential of data-driven deep learning models in medical image segmentation. However, these models typically require a large amount of paired image-label data, which is challenging to obtain due to the specialized expertise required from clinical physicians. In scenarios with predominantly unlabeled data, semi-supervised learning (SSL) has emerged as a popular paradigm for medical image segmentation, leveraging both limited labeled and abundant unlabeled data efficiently.
Existing SSL approaches primarily fall into two categories: pseudo-labeling and consistency-based co-training. Pseudo-labeling focuses on selecting reliable pseudo-labels, while co-training emphasizes sub-network diversity for complementary information extraction. However, both paradigms struggle with the inevitable erroneous predictions from unlabeled data, which poses a risk to task-specific decoders and ultimately impacts model performance. To address this challenge, the authors propose a novel SSL method called PICK (Predict and Mask for Semi-Supervised Medical Image Segmentation), which operates by masking and predicting pseudo-label-guided attentive regions to exploit unlabeled data.
Paper Source
The paper is co-authored by Qingjie Zeng, Zilin Lu, Yutong Xie, and Yong Xia from the National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology at Northwestern Polytechnical University, the Australian Institute for Machine Learning at the University of Adelaide, and the Research & Development Institute of Northwestern Polytechnical University in Shenzhen and Ningbo. The paper was accepted by the International Journal of Computer Vision on December 9, 2024, and published in 2025.
Research Content
Research Process
The core idea of the PICK model is to extract useful information from unlabeled data by masking and reconstructing pseudo-label-guided attentive regions. The model features a shared encoder and three task-specific decoders: a primary decoder, a masked image modeling (MIM) decoder, and an auxiliary decoder.
- Primary Decoder: The primary decoder is supervised solely by labeled data and generates pseudo-labels for unlabeled data, identifying potential target regions in unlabeled images.
- MIM Decoder: The MIM decoder reconstructs the masked target regions, optimizing through a reconstruction task to enhance the encoder’s understanding of target semantics.
- Auxiliary Decoder: The auxiliary decoder learns from the reconstructed images, with its predictions constrained by the primary decoder, thereby reconciling the segmentation and reconstruction tasks.
Experimental Results
The PICK model was evaluated on five medical benchmarks, including single organ/tumor segmentation, multi-organ segmentation, and domain-generalized tasks. The results indicate that PICK outperforms state-of-the-art methods. For instance, in lung tumor segmentation, PICK surpasses the best competitor, CauSSL, by 2.46% in Dice coefficient when utilizing 20% labeled data.
Conclusions and Significance
The PICK model introduces a novel approach to SSL by integrating pseudo-label-guided region masking and reconstruction, distinguishing it from current SSL methods focused on pseudo-labeling or consistency-based co-training. The model effectively mitigates the impact of incorrect pseudo-labels by embedding unlabeled target information into the encoder while preserving the integrity of the primary decoder. Extensive experiments across five public datasets validate PICK’s superiority, demonstrating significant improvements over leading SSL methods.
Research Highlights
- Novel Masking and Reconstruction Strategy: PICK leverages pseudo-label-guided region masking and reconstruction to mine unlabeled data, significantly improving model performance.
- Multi-Decoder Design: The model employs three task-specific decoders that work collaboratively to address the conflicting objectives between segmentation and reconstruction tasks.
- Extensive Experimental Validation: PICK demonstrates superior performance across various medical image segmentation tasks, proving its versatility and robustness.
Other Valuable Information
The code for the PICK model is available on GitHub, allowing researchers to access and further explore the model: PICK Code.
Summary
The PICK model proposes an innovative SSL method for medical image segmentation by integrating pseudo-label-guided region masking and reconstruction. The model significantly outperforms existing methods across multiple tasks, providing a robust and efficient solution for medical image segmentation. With its novel approach and strong experimental validation, PICK holds great promise for advancing the field of medical image analysis and improving clinical diagnostic tools.