LineConvGraphs: Line Conversation Graphs for Effective Emotion Recognition Using Graph Neural Networks
A New Approach to Emotion Recognition in Conversations Based on Graph Neural Networks
Research Background
Emotion recognition (ER) is an important component of human-computer interaction (HCI), aiming to identify human emotional states by analyzing multimodal data such as speech, text, and video. This technology has broad application prospects in fields like healthcare, education, social media, and chatbots. In recent years, emotion recognition research has gradually shifted from single-sentence emotion analysis to emotion recognition in conversations (ERC), which involves identifying the emotional state of each utterance in a dialogue. Compared with single-sentence emotion analysis, ERC is more challenging because emotions in conversations are not only influenced by the current sentence but also by contextual information and interactions between speakers.
Traditional ERC methods primarily rely on sequence models such as recurrent neural networks (RNNs) and long short-term memory networks (LSTMs). However, these methods have certain limitations when handling long-distance dependencies and complex contexts. To overcome these issues, researchers began exploring graph neural network (GNN)-based approaches, modeling conversations as graph structures and using nodes and edges to capture context and inter-speaker dependencies. Despite this, existing GNN-based methods still face challenges in addressing issues like emotion shift and speaker independence.
To address these problems, a research team from IIT Madras, National Institute of Standards and Technology, and University of Maryland proposed a novel graph construction method—Line Conversation Graphs (LineConGraphs)—and developed two new models based on this approach: LineConGCN and LineConGAT. The related research was published in the IEEE Transactions on Affective Computing journal in 2025.
Research Methods and Workflow
1. Construction of Line Conversation Graphs
The core idea of LineConGraphs is to model each utterance in a conversation as a node in a graph, with edges connecting adjacent nodes. Specifically, each node is connected to its previous and next nodes, capturing short-term contextual information. Additionally, to capture long-distance dependencies, the researchers used multi-layer graph convolutional networks (GCNs) or graph attention networks (GATs) to expand the receptive field of the nodes.
In the experiments, the researchers used two major datasets: IEMOCAP and MELD. IEMOCAP contains 151 dialogues involving 10 speakers, while MELD includes 1,433 dialogues involving 304 speakers. Each dialogue was modeled as an independent graph, with node features extracted using the pre-trained EmoBERTa model.
2. Embedding Sentiment Shift Information
To capture emotion shifts in conversations, the researchers embedded sentiment shift information into the edges of the graph. In the GCN model, sentiment shifts were encoded as edge weights; in the GAT model, they were encoded as edge features. Specifically, if the emotional state between two adjacent utterances changed, the edge weight or feature was labeled as “shift”; otherwise, it was labeled as “no shift.”
3. Model Training and Evaluation
Based on LineConGraphs, the researchers developed two models:
- LineConGCN: A GCN-based model using two GCN layers and ReLU activation functions.
- LineConGAT: A GAT-based model using two GATv2 layers to dynamically calculate attention weights between nodes.
The models were trained using the PyTorch Geometric framework, with categorical cross-entropy as the loss function and AdamW as the optimizer. The researchers evaluated the model performance using the weighted F1 score and compared it with state-of-the-art methods.
Research Results and Discussion
1. Model Performance Comparison
The experimental results showed that the LineConGAT model achieved a weighted F1 score of 76.50% on the MELD dataset and 64.58% on the IEMOCAP dataset, outperforming existing state-of-the-art methods. Additionally, embedding sentiment shift information further improved the performance of the GCN model, but the effect was less pronounced in the GAT model. The researchers suggested that this might be because the GAT model could already dynamically capture emotion shifts through its attention mechanism.
2. Embedding Speaker Information
To explore the impact of speaker information on model performance, the researchers introduced speaker embeddings into the model. The results showed that speaker embeddings had limited improvement on the MELD dataset, while on the IEMOCAP dataset, they even slightly reduced model performance. This indicates that the role of speaker information in ERC may vary depending on the dataset.
3. Comparison Between Fully Connected Graphs and LineConGraphs
To validate the effectiveness of LineConGraphs, the researchers also constructed fully connected graphs (Fully Connected Conversation Graphs), where every two nodes in the graph were connected. The experimental results showed that LineConGraphs performed better in capturing local context and emotion shifts, while fully connected graphs led to decreased model performance due to information overload.
4. Error Analysis
Through confusion matrices, the researchers analyzed the model’s performance across different emotion categories. The results showed that the model performed best in recognizing “neutral” emotions but experienced some confusion in distinguishing similar emotions like “anger” and “frustration,” or “happy” and “excited.” After embedding sentiment shift information, the model significantly reduced misclassification rates for “neutral” emotions.
Research Conclusions and Future Directions
This study proposed a novel graph construction method—LineConGraphs—and developed LineConGCN and LineConGAT models based on this approach. The experimental results demonstrated that LineConGraphs effectively captured both short-term and long-term contextual information in conversations, improving the accuracy of emotion recognition. Particularly, the LineConGAT model achieved state-of-the-art performance on both the MELD and IEMOCAP datasets.
Future research directions include:
1. Incorporating multimodal data (e.g., audio and video) into LineConGraphs to further enhance emotion recognition accuracy.
2. Exploring dynamic context modeling methods to enable the model to automatically adjust the context window size based on dialogue content.
3. Developing GNN models capable of handling speaker information, especially for large-scale datasets.
Research Highlights
- Innovative Graph Construction Method: LineConGraphs effectively captures short-term contextual information in conversations by connecting adjacent utterances, while extending long-distance dependency modeling capabilities through multi-layer GNNs.
- Embedding Sentiment Shift Information: For the first time, sentiment shift information was introduced into GNN-based models, significantly improving the performance of GCN models in emotion recognition.
- Exploration of Speaker Independence: Through comparative experiments, the role of speaker information in ERC was revealed, providing important references for future research.
- Validation Across Multiple Datasets: Experiments were conducted on two benchmark datasets—IEMOCAP and MELD—validating the model’s generalization ability across different scenarios.
This study provides a new approach and methodology for emotion recognition in conversations, offering significant theoretical value and practical application prospects.