Probabilistic Memory Auto-Encoding Network for Abnormal Behavior Detection in Surveillance Video

Research on Abnormal Behavior Detection in Surveillance Video Based on Probabilistic Memory Auto-Encoding Network

Academic Background

In intelligent surveillance systems, abnormal behavior detection is a crucial function widely applied in anti-terrorism, social stability maintenance, and public safety assurance. However, a core challenge in abnormal behavior detection is the extreme imbalance between normal behavior data and abnormal behavior data. Normal behavior data is usually abundant and easy to obtain, while abnormal behavior data is scarce and difficult to predict. This imbalance makes it difficult for traditional supervised learning methods to effectively train models. Therefore, researching how to leverage large amounts of normal behavior data to model normal behavior distributions and detect abnormal behavior based on this has become an important research direction.

In recent years, deep learning-based methods have made significant progress in abnormal behavior detection. In particular, methods based on video frame reconstruction and future frame prediction are considered superior to traditional reconstruction methods. However, existing methods still have limitations when dealing with complex scenes and multi-modal normal behaviors. To address these issues, this study proposes a semi-supervised abnormal behavior detection algorithm based on the Probabilistic Memory Auto-Encoding Network (PMAE).

Source of the Paper

Model of Abnormal Behavior Detection in Surveillance Video Based on Probabilistic Memory Auto-Encoding Network

This paper was co-authored by Jinsheng Xiao, Jingyi Wu, Shurui Wang, and Qiuze Yu from the School of Electronic Information at Wuhan University, Honggang Xie from the School of Electrical and Electronic Engineering at Hubei University of Technology, and Yuan-Fang Wang from the Department of Computer Science at the University of California, Santa Barbara. The paper was published in the 2025 issue of the journal Neural Networks, titled Probabilistic Memory Auto-Encoding Network for Abnormal Behavior Detection in Surveillance Video.

Research Process

1. Research Design

The goal of this study is to detect data that deviates from the normal behavior distribution by learning the distribution of normal behaviors, thereby achieving abnormal behavior detection. To this end, the research team designed a framework based on an Auto-Encoding Network, incorporating a probabilistic model and a memory module to assist in modeling normal behavior patterns.

2. Auto-Encoding Network

The Auto-Encoding Network serves as the backbone network for extracting spatiotemporal features from video frames. To avoid future information leakage, the research team employed Causal 3D Convolution and time-dimension shared fully connected layers. The Auto-Encoding Network consists of an encoder, a decoder, and a frame predictor. The encoder maps the input video frame group to a hidden vector, the decoder reconstructs the hidden vector into a multi-dimensional spatiotemporal feature map, and the frame predictor converts these feature maps into the final predicted frame.

3. Probabilistic Model

To fit the distribution of the input data, the research team designed an Autoregressive Conditional Probability Estimation Model. This model recursively calculates the potential probability distribution of the output data through an autoregressive process, enabling the network to converge to a low-entropy state when faced with normal behavior data. Specifically, the model uses an orderly stacked fully connected layer to estimate the conditional probability density of each hidden vector element, avoiding the uncertainty of manual sorting.

4. Memory Module

The memory module is used to store normal behavior features from historical data and integrates the memory vector with the current input data through an attention mechanism. The read operation of the memory module is similar to the attention mechanism, generating fusion weights by calculating the cosine similarity between the query vector and the memory vector, thereby forming a new query vector. The update operation of the memory module injects current input information into the memory vector through weighted averaging, enabling continuous updates to the memory vector.

5. Objective Function and Anomaly Score

The research team defined the objective function and anomaly score from three aspects: reconstruction error, probability entropy, and memory features. The reconstruction error measures the difference between the predicted frame and the real frame using Mean Squared Error (MSE); the probability entropy measures the probability distribution of the hidden vector using cross-entropy loss; the memory features reduce intra-class differences and increase inter-class differences through feature tightness loss and feature separation loss. Finally, the anomaly score is a weighted average of the contributions from each module.

Main Results

1. Experimental Configuration

The research team conducted performance tests on two public datasets: UCSD Ped2 and ShanghaiTech. The UCSD Ped2 dataset contains 16 training videos and 12 test videos with a resolution of 240×360; the ShanghaiTech dataset contains 437 campus surveillance videos with a resolution of 856×480. The experiments were conducted using Python 3.6 and the PyTorch 1.1.0 framework, with training and testing completed in an NVIDIA Tesla V100 GPU environment.

2. Ablation Experiments

To explore the role of each module, the research team conducted ablation experiments on the UCSD Ped2 dataset. The results showed that skip-layer connections significantly improve the network’s reconstruction ability; the probabilistic model performs better when the video frame group length is short, while the memory module performs better when the video frame group length is long. Overall, the addition of each module positively impacts the network’s performance.

3. Comparison with Classic Algorithms

The research team compared the PMAE algorithm with several classic algorithms. On the UCSD Ped2 dataset, the PMAE algorithm achieved an AUC value of 0.958, and on the ShanghaiTech dataset, it achieved an AUC value of 0.729, outperforming most comparison algorithms. Additionally, the inference speed of the PMAE algorithm reached 96.3 FPS, meeting the requirements for real-time surveillance.

Conclusion and Significance

This study proposes a semi-supervised abnormal behavior detection algorithm based on the Probabilistic Memory Auto-Encoding Network, which detects data deviating from the normal behavior distribution by learning the distribution of normal behaviors. The results show that the algorithm performs excellently on multiple public datasets, with high detection accuracy and real-time performance. Moreover, the algorithm’s design fully considers the multi-modal characteristics of normal behaviors, effectively avoiding the reconstruction of abnormal frames and improving detection rates.

Research Highlights

  1. Probabilistic Memory Auto-Encoding Network: By combining the probabilistic model and the memory module, it effectively addresses the imbalance between normal behavior data and abnormal behavior data.
  2. Autoregressive Conditional Probability Estimation Model: Fits the distribution of input data through an autoregressive process, enabling the network to converge to a low-entropy state and enhancing its ability to model normal behaviors.
  3. Memory Module: Stores multiple normal behavior patterns, achieving the coexistence of multi-modal normal behavior data and avoiding the reconstruction of abnormal frames.
  4. Real-Time Performance: The algorithm’s inference speed reaches 96.3 FPS, meeting the requirements for real-time surveillance.

Other Valuable Information

The research team also demonstrated the algorithm’s performance in actual surveillance videos through visualization experiments. The results show that the PMAE algorithm can effectively identify abnormal behaviors and performs well in multiple scenarios. Additionally, the research team visualized the feature distributions of each module using the t-SNE method, further validating the algorithm’s effectiveness.

This study provides an effective method for solving the problem of abnormal behavior detection in surveillance videos, with significant scientific value and application prospects.