A Displacement Uncertainty-Based Method for Multi-Object Tracking in Low-Frame-Rate Videos

The Academic Report on Low-Frame-Rate Multi-Object Tracking

Introduction and Research Background

In recent years, multi-object tracking (MOT) has been widely applied in intelligent video surveillance, autonomous driving, and robotics vision. However, traditional MOT methods are predominantly designed for high-frame-rate videos and face significant challenges in low-frame-rate scenarios. In low-frame-rate videos, objects between consecutive frames exhibit increased displacement, and abrupt changes in object appearance and visibility pose greater challenges for detection association and trajectory maintenance. As edge devices often face constraints in computing, storage, and transmission bandwidth, low-frame-rate video has become an important choice for efficient solutions, but its technical difficulties need to be addressed.

This study, conducted by researchers from Zhejiang University and the Hong Kong University of Science and Technology, is published in the International Journal of Computer Vision, titled “AppTracker+: Displacement Uncertainty for Occlusion Handling in Low-Frame-Rate Multiple Object Tracking.” The research aims to address the association challenges of MOT in low-frame-rate scenarios, proposing a novel online tracking method, AppTracker+, and demonstrating its robustness and effectiveness through experiments.

Research Methods and Technical Implementation

Overall Framework

Building on the existing CenterTrack framework, the authors introduced new components, including the “APP Head” (Appear Predictor) and a displacement uncertainty estimation module to address the reliability issues of associations in low-frame-rate scenarios. A multi-stage matching strategy is proposed, optimizing the association process by integrating visual cues and historical motion information.

Key Technologies and Innovations

  1. Design of the APP Head: The APP Head identifies newly appearing objects in the current frame (i.e., objects that were not visible in the previous frame). By introducing this module, the model can recognize unreliable displacement estimations, avoiding identity switches caused by incorrect associations.

  2. Displacement Uncertainty Estimation: The authors reformulated the displacement estimation task as a heteroscedastic regression task, leveraging Bayesian deep learning methods to capture the uncertainty of each displacement estimation. The variance output by this module quantifies estimation errors, guiding subsequent association decisions.

  3. Multi-Stage Matching Strategy: A hybrid matching strategy based on displacement uncertainty is proposed. Initially, high-confidence objects are matched using greedy matching to handle discrete noise, followed by Hungarian matching for remaining objects to address minor displacement errors.

  4. Data Augmentation and Training Optimization: To address the scarcity of training samples for the APP Head, a static image augmentation strategy is introduced, generating simulated samples of newly appearing objects by randomly erasing objects in the image. Additionally, a circular mask-based heatmap strategy is designed to address noisy supervision for low-visibility objects.

Experimental Design

The study conducted experiments on public datasets, including MOT17, MOT20, and KITTI, simulating video scenarios at different frame rates to validate the model’s performance. Evaluation metrics include MOTA, IDF1, and HOTA, which are commonly used in MOT.

Experimental Results and Analysis

Performance Improvements

  1. Effectiveness of the APP Head: Experiments showed that introducing the APP Head significantly reduced the number of identity switches (IDS). In the MOT17 validation set at 110 frame rate, IDS decreased from 4.5% to 3.9%.

  2. Impact of Displacement Uncertainty Estimation: The displacement uncertainty module further optimized the matching process. In the MOT17 validation set, the IDF1 score improved to 72.5%.

  3. Advantages of the Multi-Stage Matching Strategy: Compared with single Hungarian matching or greedy matching, the hybrid matching strategy demonstrated superior handling of detection and association noise in low-frame-rate scenarios.

Comparison with Existing Methods

Compared with classic methods such as FairMOT, ByteTrack, and CenterTrack, AppTracker+ exhibited superior identity-preserving capabilities in low-frame-rate scenarios, particularly in environments with complex occlusions. In the MOT17 validation set at 110 frame rate, AppTracker+ achieved the highest IDF1 score among all compared methods.

Cross-Dataset Evaluation

To test generalization across datasets, the model was pre-trained on MOT17 and tested on MOT20. Despite significant differences in object appearance and occlusion patterns, AppTracker+ maintained high association accuracy, demonstrating strong generalization capability.

Conclusion and Significance

This paper addresses the challenges of MOT in low-frame-rate videos by proposing an innovative solution, AppTracker+, which is validated through comprehensive experiments.

Practical Value

  1. Applicability in Real Scenarios: AppTracker+ is suitable for scenarios such as intelligent traffic monitoring, autonomous driving, and robotic navigation, achieving accurate tracking under low computing resources.

  2. Academic Contribution: By introducing displacement uncertainty analysis, this study brings a new perspective to the MOT field and advances methodologies in low-frame-rate scenarios.

Future Work

The authors proposed several directions for further improvement, including: 1. Decoupling the detection and displacement estimation modules for more flexible deployment and optimization. 2. Enhancing the robustness of the model in extremely low-frame-rate scenarios (/15 frame rate). 3. Addressing challenges in scenarios where multiple targets are simultaneously occluded at the same location.

AppTracker+ provides a high-performance and robust solution for MOT in low-frame-rate videos and contributes positively to advancing research in this domain.