Estimation of Heart Rate and Respiratory Rate by Fore-Background Spatiotemporal Modeling of Videos

A New Method for Estimating Heart Rate and Respiratory Rate from Videos

Background and Research Motivation

Heart rate (HR) and respiratory rate (RR) are critical physiological parameters reflecting cardiorespiratory functions. These metrics are widely used in medical, health monitoring, and psychological and behavioral studies. Traditionally, these parameters are measured using contact-based sensors, such as electrocardiography (ECG) or photoplethysmography (PPG) for HR and respiratory belts or airflow measurement devices for RR. However, in daily life, these contact-based methods face significant limitations, including the discomfort of wearing the devices, potential skin irritation, and challenges in scenarios requiring non-contact monitoring.

In recent years, non-contact physiological signal estimation from videos has received increasing attention. This approach analyzes subtle changes in skin color or body movements to estimate HR and RR without the need for direct contact. However, existing video-based methods lack robustness under varying ambient lighting conditions, thus limiting their accuracy and reliability in real-world applications. To address this challenge, the authors of this study propose a novel Fore-Background Spatiotemporal Modeling (FBST) method. By jointly modeling the foreground and background illumination, the FBST method innovatively removes external lighting interferences, offering greater accuracy and adaptability for HR and RR estimation from videos.

Paper Source and Authors

The paper titled “Estimation of heart rate and respiratory rate by fore-background spatiotemporal modeling of videos” was authored by Xiujuan Zheng, Wenqin Yan, Boxiang Liu, Yue Ivan Wu, and Haiyan Tu from the College of Electrical Engineering and the College of Electronics and Information Engineering at Sichuan University. The study was published in the Biomedical Optics Express journal (Vol. 16, No. 2) on February 1, 2025. The research was supported by the National Natural Science Foundation of China (62271333) and the Sichuan Provincial Science and Technology Support Program (2022YFS0032).

Research Methods and Workflow

This study introduces an innovative FBST method for parallel HR and RR estimation, effectively modeling and addressing ambient lighting variations. The main workflow of the study is as follows:

1. Defining Regions of Interest (ROI) and Signal Acquisition

First, the authors defined the foreground and background Regions of Interest (ROIs) in the videos. The foreground included the face and chest to extract pulse signals (face) and respiratory signals (chest), while the background comprised non-human regions of the video. To automatically segment the foreground ROIs, the SeetaFace algorithm was used, ensuring accurate extraction of physiological signals.

For improved accuracy, the face region was divided into multiple small ROIs, with noisy corner areas removed. The chest region ROIs were selected based on their Signal-to-Noise Ratio (SNR). Multiple background ROIs were processed using Principal Component Analysis (PCA) to extract the primary background illumination signals.

2. Spatiotemporal Modeling and Image Construction

The authors established a foreground-background model based on the Dichromatic Reflection Model, describing the time-varying optical reflection characteristics of the face and chest regions. Specifically, diffuse reflection in the facial region was linked to blood volume changes and HR, while respiratory-induced chest movements caused variations in chest specular reflection.

Using the models, spatiotemporal maps for the foreground and background were constructed in matrix form to capture both temporal and spatial information comprehensively.

3. Design and Application of Spatiotemporal Layers (ST Layers)

To effectively mitigate illumination interferences, a lightweight Spatiotemporal Layer (ST Layer) was introduced into the framework. This layer has two variants: a linear ST Layer for simple lighting scenarios and a nonlinear ST Layer with 1×1 convolution and ReLU activation for more complex backgrounds.

4. Parameter Estimation Using ResNet-18

Finally, the authors employed the lightweight ResNet-18 neural network model. Illumination-perturbation-free feature maps were fed into the network for HR and RR estimation. A transfer learning strategy was applied to pre-train the ResNet-18 model, with optimization using L1 loss and Pearson correlation coefficients.

Research Findings and Results

Heart Rate Estimation

The study evaluated FBST’s HR estimation performance using three public datasets (UBFC-rPPG, PURE, COHFACE) and a private dataset collected by the authors.

  • Performance Results: On the UBFC-rPPG dataset, FBST achieved a Root Mean Square Error (RMSE) of 2.79, significantly outperforming PhysNet’s 3.70. On the private dataset, FBST demonstrated high accuracy with an RMSE of 2.41. Compared to traditional methods (e.g., ICA, PCA), FBST excelled under challenging lighting conditions, providing consistently high accuracy.
  • Signal Analysis: Visual analysis of the extracted pulse signals showed that the estimated results matched well with ground truth, confirming FBST’s capability to accurately capture rhythmic patterns in HR signals in the time domain.

Respiratory Rate Estimation

For RR estimation, FBST achieved an RMSE of 3.62 on the COHFACE dataset and 5.27 on the private dataset, outperforming state-of-the-art deep learning methods (e.g., PhysNet, TS-CAN). Notably, this study is the first to demonstrate RR estimation on publicly available datasets (e.g., COHFACE) using short 10-second video windows, making it viable for real-time respiratory monitoring.

Data Resampling

Analyzing HR distributions in the UBFC and PURE datasets revealed imbalances. To address this, a data resampling strategy was adopted, which significantly reduced Mean Absolute Error (MAE) and RMSE, particularly for low HR ranges.

Significance and Outlook

Scientific Values and Applications

  1. Illumination Modeling Innovation: This study introduces a novel foreground-background illumination modeling approach, addressing a core challenge in video-based physiological signal measurement under complex lighting.
  2. Real-Time Monitoring Capability: FBST drastically reduces computational demands with its lightweight neural network, enabling efficient real-time cardiorespiratory monitoring.
  3. Practical Applications: FBST is highly suitable for non-intrusive scenarios, such as telemedicine and stress monitoring, showcasing its potential as an ideal technological solution.

Future Research Directions

Several limitations remain. For example, severe head movements and highly dynamic lighting conditions were not fully addressed. Future research should focus on optimizing background ROI detection methods and enhancing adaptability in dynamic environments. Additionally, the choice of complex nonlinear models currently relies on qualitative analysis; quantitative methods could be explored for automatic model selection.

Conclusion

This study addresses the challenge of lighting interference in HR and RR estimation from videos by proposing a novel FBST method. By mitigating background disturbances, improving estimation accuracy, and supporting real-time monitoring, the research achieves significant advancements. The work not only provides efficient tools for the academic community but also fosters the development of non-contact healthcare monitoring technologies.