Analyzing the Visual Road Scene for Driver Stress Estimation
Research on Driver Stress Estimation Based on Visual Road Scenes
Academic Background
Driver stress is a significant factor contributing to traffic accidents, injuries, and fatalities. Studies show that 94% of traffic accidents are related to drivers, with inattention, internal and external distractions, and improper speed control all closely linked to driver stress. Therefore, identifying and managing driver stress states is crucial for enhancing driving experience and safety. However, existing methods for driver stress recognition primarily rely on physiological data (such as heart rate, skin conductance, etc.) or vehicle operation data (such as steering wheel and pedal activity), which often require wearable devices or lack comprehensive consideration of the driving environment. In contrast, analyzing visual road scenes provides a non-intrusive and widely applicable solution for estimating driver stress. This study aims to explore the contribution of visual road scenes to driver stress estimation and validate its effectiveness through machine learning models.
Source of the Paper
This paper was co-authored by Cristina Bustos, Albert Sole-Ribalta, Neska Elhaouij, Javier Borge-Holthoefer, Agata Lapedriza, and Rosalind Picard, who are affiliated with Universitat Oberta de Catalunya (UOC) and MIT Media Lab. The paper was published in the IEEE Transactions on Affective Computing journal in 2023.
Research Process and Results
1. Data Source and Preprocessing
The study utilized the publicly available AffectiveRoad dataset, which contains video data from 13 real-world driving experiments covering various road scenarios such as urban areas and highways. The dataset also includes driver-reported stress values (ranging from 0 to 1) and uses semantic segmentation technology to annotate objects in the road scene (e.g., vehicles, pedestrians, traffic signs, etc.). These stress values were discretized into three categories: low, medium, and high, resulting in a dataset containing 110,000 video frames.
2. Model Design and Training
The study evaluated the performance of several machine learning models, including:
- Single-frame baseline models: Random Forest, Support Vector Machine (SVM), and Convolutional Neural Network (CNN).
- Temporal Segment Networks (TSN) and its variants: TSN-W based on learned weights and TSN-LSTM based on Long Short-Term Memory (LSTM).
- Video classification Transformers: Transformer-based video classification models and VideoMAE models.
The research adopted a “leave-one-driver-out” cross-validation strategy, dividing the data into training, validation, and test sets to ensure the model’s generalization ability on unseen driver data.
3. Experimental Results
The experimental results showed that the TSN-W model achieved the highest average accuracy of 0.77, significantly outperforming single-frame baseline models. The performances of TSN-LSTM and Transformer models were comparable to TSN-W, but TSN-W had advantages in computational efficiency and interpretability. The study also analyzed the model’s focus on high-stress scenes using Class Activation Mapping (Grad-CAM) and image segmentation techniques, finding that traffic congestion, pedestrians, and large vehicles were the main factors leading to high-stress predictions.
4. Interpretability Analysis
By combining Grad-CAM and image segmentation techniques, the study quantified the model’s attention to road scene objects across different stress categories. The results showed that the model paid more attention to pedestrians, traffic signs, and large vehicles when predicting high stress, while focusing more on vegetation and fences in low-stress scenarios. These findings provide important insights into understanding the visual triggers of driver stress.
Conclusion and Significance
This study demonstrates the feasibility of driver stress estimation based on visual road scenes and achieves high classification accuracy using the TSN-W model. The research not only validates the importance of visual contextual information for driver stress estimation but also provides theoretical support for developing safer urban road environments and driving assistance technologies. Additionally, the interpretability analysis reveals key objects in road scenes related to driver stress, offering new directions for research in related fields.
Research Highlights
- Innovative Methods: The study systematically evaluates the contribution of visual road scenes to driver stress estimation for the first time and proposes the efficient TSN-W model.
- High Accuracy: The TSN-W model achieved an average accuracy of 0.77 on the AffectiveRoad dataset, significantly outperforming baseline models.
- Interpretability Analysis: Through Grad-CAM and image segmentation techniques, the study reveals key objects in road scenes related to driver stress.
- Practical Application Value: The research results provide scientific evidence for developing vision-based driving assistance systems and safer road designs.
Other Valuable Information
The study also explored the model’s performance in different road scenarios (e.g., urban areas, highways, parking lots), finding that the model performed particularly well in urban scenarios. Additionally, the study compared the impact of various video lengths and frame rates on model performance, determining that a 40-second video sequence at 3 frames per second was the optimal configuration.
Through the in-depth analysis of this study, we not only deepen our understanding of the sources of driver stress but also provide important technical and methodological support for future research in related fields.