Distillation of Multi-Class Cervical Lesion Cell Detection via Synthesis-Aided Pre-Training and Patch-Level Feature Alignment

2024-05-31 Fri
Cervical lesion cell detection Knowledge distillation Image synthesis Computer-aided diagnosis
Distillation of Multi-Class Cervical Lesion Cell Detection via Synthesis-Aided Pre-Training and Patch-Level Feature Alignment
Background and Research SignificanceCervical cancer is a disease that seriously threatens the life and health of women. According to data from the International Agency for Research on Cancer (IARC), there were approximately 604,000 new cases of cervical cancer and about 342,000 deaths globally in 2020 (Sung et al., 2021). Early diagnosis and screening of cervical cancer can effectively prevent and treat the disease, while delayed diagnosis increases the risk of serious complications and life-threatening risks (Schiffman, Castle, Jeronimo, Rodriguez, & Wacholder, 2007). Currently, health organizations worldwide recommend early screening as an effective method for the prevention and treatment of cervical cancer (A. C. of Obstetricians, Gynecologists et al., 2010). Among these, the liquid-based cytology test (TCT) is the most commonly used and effective screening method for detecting cervical abnormalities and precancerous lesions (Davey et al., 2006).
However, the traditional manual review method for handling whole slide images (WSI) during the TCT examination is not only time-consuming but also prone to errors. Moreover, there is often significant variation in diagnosis results between different reviewers (Bengtsson & Malm, 2014). Therefore, the development of automated cervical cell analysis methods becomes particularly urgent to assist cytologists in efficiently and accurately analyzing cervical cell pathology images, thereby achieving objective diagnoses.
From a clinical perspective, the main goal of cervical TCT screening is to detect cervical lesion cells in WSI images and classify them into different lesion stages according to the Bethesda (TBS) system rules (Nayar & Wilbur, 2017). However, due to the large number of WSI image samples, the initial step of cervical cell detection requires high sensitivity to prevent missing any abnormal cells, which is crucial for subsequent analysis (Zhou et al., 2021).
In recent years, advances in deep learning technology have significantly improved the efficiency of detecting cervical lesion cells. Techniques like Faster R-CNN (Ren et al., 2015) and RetinaNet (Lin et al., 2017) have shown promise. However, these methods still face several problems such as incomplete annotations, class imbalance, and insufficient utilization of contextual information between cells (Zhang, Liu, et al., 2019). To address these issues, this paper proposes a distillation-based framework aimed at guiding the training of an image-level detection network through a pre-trained patch-level network.
Source IntroductionThis paper is titled “Distillation of Multi-Class Cervical Lesion Cell Detection via Synthesis-Aided Pre-Training and Patch-Level Feature Alignment,” authored by Manman Fei, Zhenrong Shen, Zhiyun Song, Xin Wang, Linlin Yao, Xiangyu Zhao, Lichi Zhang (*corresponding author) from the School of Biomedical Engineering, Shanghai Jiao Tong University, along with Maosong Cao and Qian Wang from the School of Biomedical Engineering, ShanghaiTech University. The article is published in the 2024 issue of the journal Neural Networks.
Detailed Research WorkResearch ProcessThe study comprises several important steps and methods:
Design of a Patch-Level Balanced Pre-Training Model (BPM):
The study proposes a patch-level cervical cell classification model named the Balanced Pre-Training Model (BPM). An image synthesis model was used to build a class-balanced patch dataset for pre-training.
Synthetic data generated by CellGAN ensures a balanced class distribution in the training data, thereby alleviating the class imbalance problem.
The training process involves two stages: initial training with synthetic data followed by fine-tuning with real data.
Score Correction Loss (SCL):
A Score Correction Loss (SCL) was designed to distill knowledge from the BPM model into the detection network, thereby mitigating the problem of incomplete annotation.
The SCL aims to correct the confidence scores of the detection model by comparing the patch scores predicted by the BPM with the detection network’s output scores.
Patch Correlation Consistency (PCC) Strategy:
The Patch Correlation Consistency (PCC) strategy was designed to utilize the contextual information between cells and enhance feature representation learning during detection.
PCC captures the contextual relationship between cells by calculating the consistency between features extracted by the detection network and the BPM network.
Main ResultsExperimental ResultsExperiments validated the superior performance of the proposed method on both public and private datasets:
ComparisionDetector Dataset:
This dataset contains 7,410 cervical cell pathology images. Experimental results show that the DINO detector incorporating this distillation framework achieved impressive results on this dataset, with an average precision (AP) of 24.6, AP@0.5 of 44.7, AP@0.75 of 23.6, and an average recall (AR) of 46.6.
Compared to the best existing methods, the proposed method improved AP, AP@0.5, AP@0.75, and AR by 4.0, 3.2, 5.9, and 8.5, respectively.
DST Dataset:
This dataset, sourced from a collaborative hospital, contains 3,807 images cropped from WSIs at 1024×1024 pixels. Experimental results demonstrate that the DINO detector, combined with the proposed method, performed excellently with an AP of 15.4, AP@0.5 of 26.3, AP@0.75 of 16.5, and AR of 45.1.
Compared to the original DINO model, the proposed method significantly improved in terms of AP, AP@0.5, AP@0.75, and AR.
Ablation ExperimentsA series of ablation experiments were conducted to verify the effectiveness of each component. The results show the importance of the Balanced Pre-Training Model (BPM), Score Correction Loss (SCL), and Patch Correlation Consistency strategy (PCC). Specifically, the BPM model incorporating synthetic data significantly enhances the accuracy of the classification network, thereby improving the overall precision of the detection model.
Conclusion and SignificanceThis paper proposes an innovative distillation framework that addresses several key issues in cervical cell detection. By combining a patch-level classification network with score correction and patch correlation strategies, the proposed method significantly improves the performance of existing detectors in multi-class cervical lesion cell detection. This has important scientific value and contributes to the further development of high-throughput screening methods for cervical cancer in practical clinical applications.
Research HighlightsNovel Distillation Framework: Utilizes a patch-level classification network to guide the training of an image-level detection network, marking the first combination of a cervical cancer cell classification network in a distillation method to optimize abnormal cell detection.
Solves Multiple Problems: Effectively addresses issues of incomplete annotation, class imbalance, and underutilization of intercellular relationships.
Flexible Generalizability: The method can be seamlessly applied to various detectors without altering their structure during inference.
The findings indicate that the proposed method significantly enhances cervical abnormal cell detection and offers valuable technical support for practical clinical applications, holding broad potential for future implementation.