Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation

Research on Empathetic Response Generation in AI Dialogue Systems

Academic Background

With the rapid development of artificial intelligence technology, open-domain dialogue systems have gradually become a research hotspot. These systems aim to engage in natural and fluent conversations with users, providing reasonable responses. However, despite significant progress in language fluency and coherence, existing dialogue systems still fall short in terms of empathy. Empathy refers to the ability to understand others’ experiences and emotions, encompassing both affective empathy and cognitive empathy. Affective empathy involves responding to users’ emotions, while cognitive empathy focuses on understanding the user’s situation. Empathy is a fundamental characteristic of human communication and is crucial for building human-like dialogue systems.

However, existing methods for empathetic response generation primarily rely on maximum likelihood estimation (MLE) as the optimization objective, failing to effectively align the empathy levels between generated and target responses. Empathy level is a fundamental concept in empathy theory, quantified through three key mechanisms: emotional reaction, interpretation, and exploration. Aligning the empathy levels between generated and target responses facilitates closer approximation to human empathy expression, thereby enhancing the quality of generated responses.

To address this issue, a research team from Hefei University of Technology and Dalian University of Technology proposed a reinforcement learning (RL)-based empathetic response generation framework—EmPRL (Empathetic Response Generation via Reinforcement Learning). This framework designs an effective empathy reward function and generates more empathetic dialog responses by maximizing the expected reward through reinforcement learning.

Paper Source

The paper was co-authored by Hui Ma, Bo Zhang, Bo Xu, Jian Wang, Hongfei Lin, and Xiao Sun, and published in IEEE Transactions on Affective Computing, officially released in 2025. The title of the paper is “Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation.” The research team hails from Hefei University of Technology and Dalian University of Technology, specializing in natural language processing, dialogue systems, and affective computing.

Research Workflow

1. Task Definition and Framework Overview

The core task of the EmPRL framework is to generate empathetic responses that can understand users’ emotions and express empathy based on the conversational context. Specifically, given a context comprising multiple conversational turns, the model needs to generate a fluent, coherent, and empathetic response.

The main components of the EmPRL framework include: - Generator: The pre-trained T5 model is used as the generator, and fine-tuning initializes the policy. - Empathy Identifier: An empathy identifier is designed and trained to recognize the empathy levels of responses within the dialogue context. - Reward Function: Combining the three empathy mechanisms—emotional reaction, interpretation, and exploration—a reward function is constructed to align the empathy levels between generated and target responses. - Reinforcement Learning Training: The Proximal Policy Optimization (PPO) algorithm is employed to train the policy to generate responses encompassing both affective and cognitive empathy.

2. Fine-Tuning of the Generator

The research team first utilized the T5 model as the generator and performed full fine-tuning. During fine-tuning, the AdamW optimizer was used with an initial learning rate of 1.0e-4 and a batch size of 8. For inference, the maximum decoding step was set to 30, and the TopK-TopP sampling strategy was applied.

3. Design and Training of the Empathy Identifier

The structure of the empathy identifier includes two independently pre-trained T5 encoders, responsible for encoding the context and response respectively. Through a single-head attention mechanism and residual connection, a context-aware representation of the response is generated, followed by max-pooling and a linear layer to predict the empathy level.

The training of the empathy identifier utilized the Mental Health Subreddits dataset, which contains 3,000 pairs. Each pair’s emotional reaction, interpretation, and exploration mechanisms are individually labeled as no, weak, or strong. The research team trained three independent empathy identifiers, each tailored to a specific empathy mechanism.

4. Reinforcement Learning Training

During the reinforcement learning training phase, the research team employed the PPO algorithm to train the policy. The reward function consists of an empathy reward and a KL penalty term; the empathy reward aligns the empathy levels between generated and target responses, while the KL penalty prevents the policy from deviating excessively from the generator. During training, the AdamW optimizer was used with a learning rate of 1.0e-5 and a batch size of 32.

Main Results

1. Automatic Evaluation Results

The research team conducted experiments on the EmpatheticDialogues dataset to evaluate the performance of the EmPRL framework. The results showed that EmPRL achieved an Empathy F1-score (Emp-F1) of 69.43%, significantly outperforming existing baseline models. Additionally, EmPRL demonstrated excellent performance in terms of the fluency and diversity of generated responses.

2. Human Evaluation Results

Through human evaluations, the research team further validated the effectiveness of the EmPRL framework. In terms of empathy, relevance, and fluency, EmPRL significantly outperformed task-related baseline models. Moreover, compared to ChatGPT, EmPRL showed stronger competitiveness in empathetic expression.

Conclusion and Significance

The EmPRL framework successfully aligns the empathy levels between generated and target responses by designing an effective empathy reward function and utilizing reinforcement learning to maximize the expected reward. Experimental results demonstrate that EmPRL can generate responses encompassing both affective and cognitive empathy, significantly enhancing the empathetic capabilities of dialogue systems.

The scientific value of this research lies in proposing a new empathetic response generation framework, filling the gap in empathy level alignment in existing methods. Furthermore, the EmPRL framework has strong application potential and can be widely applied in scenarios such as psychological counseling, emotional companionship, and mental health support.

Research Highlights

  1. Innovative Empathy Reward Function: By combining the three empathy mechanisms—emotional reaction, interpretation, and exploration—an effective empathy reward function was designed to successfully align the empathy levels between generated and target responses.
  2. Application of Reinforcement Learning: Reinforcement learning was applied for the first time to the task of empathetic response generation, using the PPO algorithm to train the policy and generate more empathetic responses.
  3. Broad Application Prospects: This research holds significant academic importance and has broad application value in practical areas like psychological counseling and emotional companionship.

Other Valuable Information

The research team also noted that future work will further extend the framework to explore methods for maintaining empathy consistency across multi-turn dialogues and introduce retrieval-augmented generation techniques to further enhance the quality of empathetic response generation.