Rehearsal-Based Continual Learning with Dual Prompts

2025-03-14 Fri
Continual Learning Dual Prompts Knowledge Distillation Catastrophic Forgetting Neural Networks
Academic BackgroundIn the fields of machine learning and neural networks, continual learning is an important research direction. The goal of continual learning is to enable models to continuously learn new knowledge across a series of tasks while avoiding forgetting previously acquired knowledge. However, existing continual learning methods face a major challenge: catastrophic forgetting. Catastrophic forgetting refers to the phenomenon where a model rapidly forgets previously learned knowledge when learning new tasks, leading to a significant decline in performance on old tasks. This issue is particularly prominent in real-world applications, as many tasks require models to continuously learn and adapt in changing environments.
To address this issue, researchers have proposed various methods, among which rehearsal-based methods are a common solution. These methods store representative samples from old tasks and replay them when learning new tasks to consolidate old knowledge. However, existing rehearsal methods face two main problems: 1) When learning new tasks, the model’s generalization ability is weakened due to the limited number of samples; 2) Although knowledge distillation can transfer old knowledge, overly strong constraints may limit the model’s ability to learn new knowledge.
To alleviate these problems, a research team from Nanjing University of Information Science and Technology, Nanjing Forestry University, Southeast University, and Nanjing University of Posts and Telecommunications proposed a rehearsal-based continual learning method with dual prompts, termed DUPT. This method introduces input-aware prompt and proxy feature prompt to enhance the model’s generalization ability and knowledge transfer efficiency from both input and feature perspectives.
Source of the PaperThis paper was co-authored by Shengqin Jiang, Daolong Zhang, Fengna Cheng, Xiaobo Lu, and Qingshan Liu. The authors are affiliated with the School of Computer Science at Nanjing University of Information Science and Technology, the College of Mechanical and Electronic Engineering at Nanjing Forestry University, the School of Automation at Southeast University, and the School of Computer Science at Nanjing University of Posts and Telecommunications. The paper was published in 2025 in the journal Neural Networks, titled “DUPT: Rehearsal-based Continual Learning with Dual Prompts.”
Research Process1. Input-aware PromptIn the process of continual learning, the number of samples for new tasks is usually limited, which restricts the model’s generalization ability. To address this issue, DUPT introduces the input-aware prompt, which dynamically expands the input distribution to help the model better capture the features of new task samples.
Specifically, the input-aware prompt is generated through the following steps:
1. Input Data Preprocessing: The input image is downsampled to a resolution of 16×16 to reduce computational complexity.
2. Attention Mechanism: The downsampled image is fed into a frozen attention module to generate attention vectors.
3. Weight Generation: The attention vectors are passed through a fully connected layer to generate weight vectors equal in number to the prompts in the prompt pool.
4. Prompt Generation: The weight vectors are weighted and summed with the prompts in the prompt pool to generate the final input-aware prompt.
The advantage of the input-aware prompt lies in its ability to use limited prompts to generate diverse input distributions, thereby enhancing the model’s generalization ability.
2. Proxy Feature PromptIn continual learning, the transfer of old knowledge is typically achieved through knowledge distillation. However, directly aligning the features of old and new models may limit the model’s ability to learn new knowledge. To address this issue, DUPT introduces the proxy feature prompt, which constructs learnable intermediate feature representations to alleviate feature conflicts.
Specifically, the proxy feature prompt is generated through the following process:
1. Prompt Pool Initialization: A prompt pool containing a fixed number of prompts is initialized.
2. Feature Extraction: The prompts in the prompt pool are fed into convolutional and fully connected layers to generate learnable prompts.
3. Knowledge Distillation: The difference between the current model’s features and the proxy feature prompt is constrained through an optimization objective, while maintaining consistency between the proxy feature prompt and the old model’s features.
The advantage of the proxy feature prompt is that it avoids direct alignment between the features of old and new models, thereby enhancing the model’s ability to learn new knowledge while retaining old knowledge.
3. Optimization ObjectiveThe optimization objectives of DUPT include the following parts:
1. Cross-entropy Loss: Used to optimize the data of the current task.
2. Rehearsal Cross-entropy Loss: Used to optimize the data of old tasks in the replay buffer.
3. Rehearsal Logit Distillation Loss: Used to constrain the output differences between the current model and the old model on the replay data.
4. Feature Distillation Loss: Used to constrain the difference between the current model’s features and the proxy feature prompt.
By jointly optimizing these objectives, DUPT can enhance both the stability and plasticity of the model during continual learning.
Main ResultsDUPT was tested on multiple datasets, including CIFAR10, CIFAR100, and TinyImageNet. The experimental results show that DUPT performs excellently in continual learning tasks, especially when the buffer size is small, where its performance significantly outperforms existing methods.
CIFAR10 Dataset: With a buffer size of 200, DUPT improved the average accuracy of DER++ by 4.92%.
CIFAR100 Dataset: With a buffer size of 500, DUPT improved the average accuracy of DER++ by 3.41%.
TinyImageNet Dataset: With a buffer size of 4000, DUPT improved the average accuracy of DER-BFP by 0.82%.
Additionally, DUPT demonstrated compatibility with existing methods. When combined with the latest DER-BFP method, DUPT achieved performance improvements of 1.30% and 1.34% on the CIFAR10 and CIFAR100 datasets, respectively.
ConclusionDUPT enhances the generalization ability and knowledge transfer efficiency of continual learning models by introducing input-aware prompt and proxy feature prompt from both input and feature perspectives. The experimental results show that DUPT performs excellently on multiple datasets, especially when the buffer size is small, where its performance significantly outperforms existing methods. Furthermore, DUPT’s compatibility allows it to seamlessly integrate with existing continual learning methods, further enhancing performance.
Research HighlightsDual Prompt Mechanism: DUPT enhances the model’s generalization ability and knowledge transfer efficiency through input-aware prompt and proxy feature prompt from both input and feature perspectives.
Significant Performance Improvement: DUPT achieves significant performance improvements on multiple datasets, especially when the buffer size is small.
Strong Compatibility: DUPT can seamlessly integrate with existing continual learning methods, further enhancing performance.
Future OutlookAlthough DUPT performs excellently in continual learning tasks, some issues still need further exploration. First, when the buffer size is small, DUPT’s performance still lags behind that of larger buffer sizes. How to more effectively represent old knowledge remains an open question. Second, DUPT relies on models trained from scratch, which are prone to overfitting on small datasets. Future research could explore how to leverage pre-trained models to alleviate this issue.
DUPT provides an effective solution for continual learning, with significant scientific value and application prospects.