A General Debiasing Framework with Counterfactual Reasoning for Multimodal Public Speaking Anxiety Detection
Academic Background and Problem Introduction
In the field of education today, Public Speaking Anxiety (PSA) is a widespread phenomenon, especially among non-native language learners. This anxiety not only affects learners’ ability to express themselves but may also hinder their personal development. To help learners overcome this issue, researchers have begun exploring how to automatically detect speech anxiety states through multimodal data (such as video, audio, and text). However, existing Multimodal Public Speaking Anxiety Detection (MPSAD) models are prone to being influenced by various latent biases during training, such as context bias, label bias, and keyword bias. These biases can cause models to overly rely on certain superficial features, failing to fully utilize multimodal information, thereby reducing detection accuracy.
To address this issue, researchers proposed a General Multimodal Counterfactual Reasoning Debiasing Framework (GMCR), aiming to eliminate hybrid biases in multimodal data from a causal perspective, thereby improving the robustness and accuracy of models.
Paper Source and Author Information
This paper was co-authored by Tingting Zhang, Yangfu Zhu, Bin Wu, and others from the School of Computer Science (National Pilot School of Software Engineering) at Beijing University of Posts and Telecommunications. It was published in the 2025 issue of the journal Neural Networks. The title of the paper is A General Debiasing Framework with Counterfactual Reasoning for Multimodal Public Speaking Anxiety Detection.
Research Process and Experimental Design
1. Problem Definition and Dataset Construction
The study first defined the MPSAD task and treated it as a multi-class classification problem. To validate the effectiveness of the GMCR framework, the researchers constructed a new Multimodal English Public Speaking Anxiety (ME-PSA) dataset. This dataset contains 794 speech videos from 365 participants, totaling 47.84 hours, and is divided into 15,378 video clips, each annotated with five levels of anxiety states. Additionally, the study used the publicly available SAC (Speaking Anxiety in Class) dataset and the CMU-MOSEI dataset for comparative experiments.
2. GMCR Framework Design
The core of the GMCR framework is to eliminate hybrid biases in multimodal data through counterfactual reasoning. Specifically, the framework includes the following three key modules:
- Causal Disentanglement Module: Independent causal and bias extractors decompose the input data of each modality into causal features and bias features, while ensuring their independence using the Hilbert-Schmidt Independence Criterion (HSIC).
- Counterfactual Branch Module: A counterfactual world is constructed by assuming the model only sees bias features, thereby evaluating the direct negative impact of biases on model predictions.
- Counterfactual Debiasing Module: During the inference stage, the Natural Direct Effect (NDE) is subtracted from the Total Effect (TE) to obtain the Total Indirect Effect (TIE), enabling unbiased predictions.
3. Experiments and Result Analysis
The study conducted extensive experiments on the ME-PSA, SAC, and CMU-MOSEI datasets, comparing the performance of the GMCR framework with various existing methods. The results showed that the GMCR framework significantly outperformed existing methods across multiple evaluation metrics. For example, on the SAC dataset, GMCR improved the 4-class accuracy of the LAD model from 53.64% to 56.36% and the F1 score from 41.54% to 45.89%. Additionally, GMCR demonstrated strong generalization capabilities on the CMU-MOSEI dataset, further validating its effectiveness.
4. Ablation Study and Parameter Sensitivity Analysis
To evaluate the contribution of each module in the GMCR framework, the study conducted systematic ablation experiments. The results showed that removing the causal disentanglement module or the counterfactual branch module led to a significant drop in model performance, indicating that these modules play a critical role in the debiasing process. Furthermore, parameter sensitivity experiments revealed that model performance peaked when the independence constraint parameters 𝛼 and 𝛽 were set to 1.0.
Research Conclusions and Significance
The GMCR framework successfully addressed the issue of hybrid biases in the MPSAD task by introducing causal reasoning and counterfactual analysis. Its main contributions include:
1. Generality: The GMCR framework does not rely on specific types of biases and can handle multiple biases simultaneously, making it applicable to any existing MPSAD model.
2. Effectiveness: Experimental results demonstrated that GMCR significantly improved model detection accuracy and robustness, achieving the best performance across multiple datasets.
3. Innovation: The GMCR framework is the first to apply counterfactual reasoning to the MPSAD task, providing new insights for debiasing research in multimodal data.
Research Highlights and Value
- Importance of the Problem: Public speaking anxiety detection is of great significance in the field of education, and the GMCR framework effectively addresses bias issues in existing models, providing technical support for personalized teaching.
- Innovation of the Method: The GMCR framework achieves unbiased predictions in multimodal data through causal disentanglement and counterfactual reasoning, offering high theoretical and practical value.
- Richness of Data: The ME-PSA dataset constructed in the study is large-scale and finely annotated, providing valuable data resources for future related research.
Other Valuable Information
The study also showcased the advantages of the GMCR framework in practical applications through case studies. For example, in a case involving context bias and keyword bias, GMCR successfully corrected the baseline model’s incorrect predictions, demonstrating its effectiveness in handling complex bias scenarios.
By proposing the GMCR framework, this paper provides a novel solution to the MPSAD task, not only advancing research in related fields but also offering strong technical support for practical applications.