An Enhanced Framework for Real-Time Dense Crowd Abnormal Behavior Detection Using YOLOv8

2025-04-18 Fri
abnormal behavior detection YOLOv8 soft-NMS object detection crowd surveillance HajjV2 deep learning
Academic BackgroundWith the increasing demand for public safety, especially during large-scale religious events such as the Hajj pilgrimage, abnormal behavior detection in dense crowds has become a critical issue. Existing detection methods often perform poorly under complex conditions such as occlusion, illumination variations, and uniform attire, leading to reduced detection accuracy. To address these challenges, researchers are dedicated to developing more advanced computer vision technologies to improve the accuracy and efficiency of real-time monitoring.
The core of this study lies in proposing an improved YOLOv8 model—the Crowd Anomaly Detection Framework (CADF)—which significantly enhances detection accuracy in complex environments by integrating Soft-NMS (a soft version of non-maximum suppression). The research not only optimizes for the specific scenarios of the Hajj pilgrimage but also validates its performance on multiple public datasets, demonstrating its broad applicability and robustness.
Source of the PaperThis paper is co-authored by Rabia Nasir, Zakia Jalil, Muhammad Nasir, Tahani Alsubait, Maria Ashraf, and Sadia Saleem, who are affiliated with different research institutions. The paper was accepted on March 24, 2025, and published in the journal Artificial Intelligence Review with the DOI 10.1007/s10462-025-11206-w.
Research Process1. Data Preparation and Frame ExtractionThe study begins by extracting video frames from the HajjV2 dataset and annotating them. The HajjV2 dataset contains videos of various Hajj pilgrimage scenarios, covering different abnormal behaviors such as crowd counterflow, non-human objects, running, sitting, and lying down. Researchers used the OpenCV tool to extract frames from the videos and convert them into JPEG images. The annotation information for each frame, including bounding box coordinates and category labels, is stored in CSV files and further converted into YOLO format for model training.
2. Model Training and Soft-NMS IntegrationThe study adopts YOLOv8 as the base model and improves it by integrating Soft-NMS technology. Soft-NMS dynamically adjusts the scores of overlapping detection boxes instead of directly deleting them, thereby retaining more valid detections in dense and occluded scenes. The training is divided into two phases: the first phase uses 15 epochs, an image size of 256, and a batch size of 8; the second phase uses 20 epochs, an image size of 416, and a batch size of 16. During training, the model learns how to accurately detect abnormal behaviors in complex environments.
3. Model Evaluation and ComparisonThe study comprehensively evaluates CADF on the HajjV2 dataset, with results showing an AUC (Area Under the Curve) of 88.27%, which is 13.09% and 12.19% higher than YOLOv2 and YOLOv5, respectively, and an accuracy of 91.6%. Additionally, the study tests the model on the UCSD and ShanghaiTech datasets, further validating its generalization ability. Compared to advanced models such as VGG19 and EfficientDet, CADF outperforms in metrics such as accuracy, AUC, precision, recall, and mAP (mean Average Precision).
Main Results1. Improved Detection AccuracyBy integrating Soft-NMS, CADF significantly improves detection accuracy on the HajjV2 dataset. Especially in scenarios with occlusion and illumination variations, the model can more accurately identify abnormal behaviors. For example, in scenes of crowd counterflow and sitting/lying, CADF achieves higher recall and precision than traditional methods.
2. Validation of Generalization AbilityTest results on the UCSD and ShanghaiTech datasets show that CADF is not only suitable for Hajj pilgrimage scenarios but can also effectively detect abnormal behaviors in other dense crowd environments. This result demonstrates the model’s adaptability and robustness across different datasets.
3. Comparison with Other ModelsCompared to models such as VGG19 and EfficientDet, CADF outperforms in multiple evaluation metrics. For instance, in AUC and mAP metrics, CADF is more than 10% higher than VGG19 and more than 5% higher than EfficientDet. This result further proves the superiority of CADF in dense crowd abnormal behavior detection.
Conclusion and SignificanceThe CADF framework proposed in this study significantly improves the accuracy and robustness of dense crowd abnormal behavior detection by integrating Soft-NMS technology. The framework not only performs excellently in Hajj pilgrimage scenarios but also demonstrates its broad applicability on multiple public datasets. The research findings are of great significance for enhancing the safety of large-scale public events, especially in high-risk scenarios such as religious gatherings and sports events, where it can effectively prevent accidents like stampedes.
Moreover, the application of the CADF framework aligns with the United Nations Sustainable Development Goals (SDGs) Goal 3 (Good Health and Well-being) and Goal 11 (Sustainable Cities and Communities). By leveraging technological means to enhance public safety, it provides strong support for building safer and more sustainable urban environments.
Research HighlightsSoft-NMS Integration: By dynamically adjusting detection box scores, detection accuracy in occluded and dense scenes is significantly improved.
Multi-Dataset Validation: Validated on multiple datasets, including HajjV2, UCSD, and ShanghaiTech, demonstrating the model’s broad applicability.
Comparison with Advanced Models: Outperforms models such as VGG19 and EfficientDet in multiple evaluation metrics, showcasing its superiority.
Practical Application Value: The research findings are of great significance for enhancing the safety of large-scale public events, especially in high-risk scenarios where it can effectively prevent accidents.
Other Valuable InformationThis study also explores the potential of the CADF framework in real-time monitoring, achieving efficient real-time detection by optimizing the model architecture and training strategies. Additionally, the study proposes future research directions, such as further optimizing the model’s performance in extreme environments and exploring more application scenarios.