Knowledge Probabilization in Ensemble Distillation: Improving Accuracy and Uncertainty Quantification for Object Detectors

2025-02-05 Wed
knowledge probabilization ensemble distillation object detection uncertainty quantification deep neural networks accuracy improvement feature knowledge
Research on the Application of Knowledge Probabilization in Ensemble DistillationAcademic Background: Significance of the Research and Problem StatementIn recent years, deep neural networks (DNNs) have found broad applications in safety-critical fields such as autonomous driving, medical diagnosis, and climate prediction due to their outstanding predictive capabilities. However, these fields demand not only high predictive accuracy but also reliable uncertainty quantification (UQ). For instance, in scenarios such as autonomous vehicles operating in snowy conditions, overly confident models might make unsafe decisions. Therefore, enhancing uncertainty quantification has become a crucial topic in the application of deep learning.
Deep ensembles are an important research direction owing to their remarkable performance in improving prediction accuracy and UQ. However, ensemble models face significant challenges in practical applications, especially in resource-constrained environments, due to their high computational and storage requirements. To address this issue, researchers have proposed ensemble distillation, which transfers knowledge from multiple deep ensemble teacher models to a single student model to reduce complexity. Nevertheless, existing ensemble distillation methods primarily focus on classification tasks, while their application to object detection and the enhancement of UQ remain underexplored.
Paper Information: Research Source and Institutional BackgroundThe paper, “Knowledge Probabilization in Ensemble Distillation: Improving Accuracy and Uncertainty Quantification for Object Detectors,” published in January 2025 in IEEE Transactions on Artificial Intelligence, was authored in collaboration by researchers from the University of Science and Technology of China (USTC), A*STAR’s Institute for Infocomm Research and Centre for Frontier AI Research in Singapore, and East China Normal University. In this study, authors Yang Yang, Chao Wang (senior IEEE member), Lei Gong, and others propose an innovative framework, PROBED (Probabilization-Based Ensemble Distillation), which provides a groundbreaking solution to enhance both prediction accuracy and UQ for object detection models.
Research Workflow: Detailed Study Design Based on PROBED FrameworkOverview of the WorkflowThe PROBED framework enhances the UQ performance of the student model by converting knowledge from the ensemble model—including feature knowledge, semantic knowledge, and localization knowledge—into probability distribution representations. The major components of the research workflow include:
Feature Knowledge Extraction and Probabilization

Saliency filtering is employed to select significant regions of feature maps. The values of these regions are converted into histogram probability distributions.
Semantic Knowledge Transfer

Classification score vectors from the detection heads of teacher models are directly used as natural probability distributions without additional transformation.
Localization Knowledge Probabilization

Bounding box location data is discretized, dividing the interval into discrete variables, and probability distributions are generated using the softmax function.
Random Smoothing Perturbation

A random smoothing kernel is introduced to perturb input data, enhancing the diversity of the student’s learning from the teacher model’s outputs.
Experiment Design and MethodologyThe study utilized common object detection datasets, including COCO, Foggy COCO, and PASCAL VOC, and tested five mainstream detection algorithms: Faster R-CNN, RetinaNet, FCOS, YOLOv3, and DeTR. Through a series of comparative experiments, PROBED was shown to significantly improve accuracy, UQ capabilities, and cross-domain robustness.
During training, CNN models like Faster R-CNN followed a 2x learning rate schedule (24 epochs in total), with learning rates reduced at the 16th and 22nd epochs. The transformer-based model, DeTR, was trained for 50 epochs, with a learning rate adjustment at epoch 40. Additionally, the research carefully tuned and optimized critical parameters during the saliency filtering process and the random perturbation strategy, including perturbation step size and scale.
Research Findings: Key Discoveries and Supporting DataImprovement in Prediction Accuracy

Across both COCO and Pascal VOC datasets, PROBED demonstrated higher mean average precision (mAP) than other ensemble distillation methods on all object detection algorithms. For example, in the Faster R-CNN framework, mAP increased from 37.51 (traditional methods) to 37.92.
Enhancement in Uncertainty Quantification

PROBED outperformed other methods in terms of Detection Expected Calibration Error (D-ECE) and Localization-Aware Calibration Error (LAECE). On the Foggy COCO dataset, D-ECE was reduced from 10.94 (traditional methods) to 10.01, while LAECE decreased from 17.89 to 17.02.
Effectiveness of the Random Perturbation Strategy

Comparison with alternative perturbation methods, such as ODS and STDiv, revealed that PROBED’s random smoothing perturbation strategy excelled in improving both prediction accuracy and UQ.
Research Conclusions: Significance and Application ValueBy innovatively incorporating knowledge probabilization, the PROBED framework significantly optimized the workflow of ensemble distillation, achieving dual improvements in model accuracy and UQ. Specifically, the framework dramatically reduced computational resource requirements while maintaining high detection performance, providing a practical solution for resource-constrained environments. Further experiments demonstrated strong cross-domain robustness, making it applicable to safety-critical tasks such as autonomous driving and medical diagnosis.
Research Highlights and InnovationsInnovative Knowledge Probabilization Method

By unifying feature, semantic, and localization knowledge as probability distributions, the framework substantially enhanced the student’s learning efficiency.
Introduction of Random Smoothing Perturbation

To address the homogenization of ensemble teacher model outputs, the random smoothing perturbation strategy effectively improved the student’s ability to learn diverse predictions.
Broad Applicability

The PROBED framework is applicable to various mainstream object detection algorithms using either CNN or Transformer architectures, with results demonstrating its generality.
SummaryThis study achieved a deep integration of theory and practice in ensemble distillation for UQ in object detection. PROBED not only enhanced model accuracy and robustness but also provided a reliable guarantee for applying deep learning models to safety-critical tasks. As a significant scientific breakthrough in the field of ensemble distillation, PROBED paves the way for the development of more efficient and precise object detection models in the future.