Inhibition Adaption on Pre-Trained Language Models

InA: Inhibition Adaptation Method on Pre-trained Language Models

Pre-trained Language Models (LMs) have achieved significant results in Natural Language Processing (NLP) tasks. However, traditional fine-tuning methods suffer from the problem of redundant parameters, which affects efficiency and effectiveness. To address this challenge, this paper proposes a fine-tuning method called Inhibition Adaptation (INA), which reduces the added adjustable weights and appropriately re-weights the knowledge from pre-trained language models.

Research Background and Issue

Currently, fine-tuning pre-trained language models is a common method for solving NLP downstream tasks. However, classic fine-tuning methods require updating all model parameters, leading to redundant parameter issues, especially when applied to new downstream tasks. Redundant parameters not only affect model efficiency but also hinder performance improvement. To solve this problem, existing studies have attempted to adjust only specific vectors to learn additional parameters while keeping most pre-trained parameters unchanged. However, this method still has redundancy issues in information transmission. Therefore, this study proposes the INA method to achieve more efficient fine-tuning with fewer adjustable parameters.

Research Source

This paper is authored by scholars Cheng Kang, Jindrich Prokop, Lei Tong, Huiyu Zhou, Yong Hu, and Daniel Novak, from Czech Technical University, University of Leicester, and the University of Hong Kong. The paper is published in the journal Neural Networks and was accepted on May 23, 2024.

Research Method and Process

1. Research Process

a. Insertion of Trainable Vectors: Insert a small trainable vector into each Transformer attention structure. b. Setting Thresholds: Directly eliminate irrelevant knowledge by setting thresholds to inhibit the transmission of irrelevant information.

2. Research Subjects

The research involves three pre-trained language models, BERT-large, RoBERTa-large, and DeBERTa-large, mainly applied in text classification and question-answering tasks. Experiments were conducted on the GLUE benchmark, SQuAD v1.1, and SQuAD v2.0 datasets for evaluation.

3. Experimental Methods

Mainly involves the following steps and techniques:

  1. Inhibition Mechanism: Introduce an inhibition mechanism to control information transmission by setting specific thresholds.
  2. Choice of Activation Function: Select appropriate activation functions (such as GELU or LeakyReLU) to achieve the best inhibition effect.
  3. Low-Rank Decomposition and Information Compression: Similar to the LoRA method, use low-rank decomposition to compress information so that the model can maintain performance while reducing parameters.

Main Results

1. GLUE Benchmark Test Results

In the GLUE benchmark test, INA performed exceptionally well in many tasks, particularly in CoLA, SST-2, and MRPC tasks. Specific results are listed as follows (see Table 3):

  • BERT-large fine-tuned with INA achieved an MCC score of 65.9 on the CoLA task, surpassing traditional fine-tuning methods.
  • RoBERTa-large fine-tuned with INA performed excellently in multiple tasks, especially standing out in CoLA and MRPC tasks.

2. SQuAD Question-Answering Tasks

In SQuAD v1.1 and v2.0, models fine-tuned with INA performed well in both accuracy and recall rates. Specific data are as follows (see Table 4):

  • BERT-large achieved F1/EM scores of 91.384.6 on SQuAD v1.1, slightly better than traditional methods.
  • RoBERTa-large also showed significant improvements in F1/EM scores on SQuAD v2.0.

Conclusion and Significance

This study effectively reduces redundant information transmission during fine-tuning by introducing an inhibition mechanism, thereby improving the model’s performance in downstream tasks. The main conclusions are as follows:

  1. Scientific Value: The INA method, through appropriate inhibition mechanisms and low-rank decomposition methods, provides a more efficient way to fine-tune pre-trained language models. This not only reduces the required adjustable parameters but also inhibits the transmission of unrelated information.
  2. Application Value: INA’s performance in multiple NLP tasks highlights its potential in practical applications, providing strong support for further improving the fine-tuning effect of pre-trained language models.

Research Highlights

  1. Innovativeness: The proposed INA method introduces an inhibition mechanism, which is novel and effective in existing fine-tuning methods.
  2. Practicality: By reducing redundant parameters and effectively inhibiting unrelated information, INA improves the model’s adaptability and performance.
  3. Widespread Applicability: INA exhibits excellent performance in different language models and tasks, especially in text classification and question-answering tasks.

Other Valuable Information

When choosing activation functions and setting appropriate thresholds, GELU and LeakyReLU, due to their shorter negative tails, showed better results. Additionally, INA can effectively suppress low-relevant or unrelated information when handling downstream tasks, making the model more focused on task-related features.

Future Work

Future research will focus on exploring the application of INA in other NLP tasks, as well as how to further optimize the parameter settings of the inhibition mechanism to achieve better fine-tuning results. Moreover, further experiments on multi-choice generation tasks like SWAG will help understand why INA did not show significant improvement in some tasks. This paper summarizes the application and significant effects of INA in fine-tuning pre-trained language models, showcasing its potential in reducing redundant parameters and enhancing task performance.