Mitigating Social Biases of Pre-trained Language Models via Contrastive Self-Debiasing with Double Data Augmentation

Introduction:

Currently, pre-trained language models (PLMs) are widely applied in the field of natural language processing, but they have the problem of inheriting and amplifying social biases present in the training corpora. Social biases may lead to unpredictable risks in real-world applications of PLMs, such as automatic job screening systems tending to assign jobs requiring logical skills (e.g., doctors, programmers) to males and jobs requiring caring abilities (e.g., nurses, nannies) to females due to gender bias; medical systems may exhibit racial bias, calculating that Black patients appear more “frail” than White patients at the same risk level. Therefore, mitigating social biases encoded in PLMs has become a meaningful and challenging research area.

Paper Source:

This paper was published in Issue 332 of the authoritative journal Artificial Intelligence in 2024. The first author is Yingji Li, the second author is Mengnan Du, and the remaining authors are from the College of Computer Science and Technology at Jilin University, the Department of Data Science at New Jersey Institute of Technology, the School of Artificial Intelligence at Jilin University, and the Key Laboratory of Computer-Aided Design and Computer Graphics of the Ministry of Education.

Research Content and Innovations:

This paper proposes a Contrastive Debiasing Model (CD3) that effectively mitigates social biases encoded in PLMs through two stages: double data augmentation and contrastive self-debiasing.

The double data augmentation stage first performs the first round of augmentation on the original corpus using sensitive attribute words (e.g., male/female) to obtain positive sample pairs. Then, it automatically searches for bias prompts that maximize the difference in PLM encodings across different demographic groups and concatenates them with the first-round augmented samples to perform the second round of data augmentation. This method breaks the limitations of previous data augmentation methods that relied on human expertise.

The contrastive self-debiasing stage utilizes the augmented corpus to train a pluggable debiasing adapter through contrastive learning, mapping the sentence representations of the PLM from the original biased space to a debiased new space without updating the PLM’s parameters. This adapter is widely applicable to any PLM model, saving substantial computational resources while preserving the PLM’s language modeling capabilities.

The paper evaluates the gender and racial debiasing effects on multiple real-world datasets and fairness metrics. The experimental results demonstrate that, compared to baseline models, CD3 achieves excellent debiasing performance on BERT, ALBERT, and ROBERTA while retaining the PLMs’ language modeling capabilities.

Research Process and Methods:

I. Double Data Augmentation

1) Perform the first round of data augmentation on the original corpus by replacing sensitive attribute words to obtain positive sample pairs.

2) Automatically search for bias prompts: For each positive sample pair, find the prompt sequence within a given search space that maximizes the distance between the sentence representations as the bias prompt. Specifically, in each iteration, calculate the cosine similarity of sentence representations for the current prompt candidates, select the Top K with the smallest similarity as the result for this iteration, and concatenate them with the candidates for the next iteration. Repeat until the end of iterations.

3) Concatenate the obtained bias prompts with the first-round augmented positive sample pairs to obtain the final augmented corpus.

II. Contrastive Self-Debiasing

1) Input the augmented corpus into the PLM encoder to obtain sentence representations.

2) Use a trainable adapter G to map the sentence representations from the original space to a new space, outputting the debiased sentence representations.

3) Input the debiased representations of positive sample pairs into a contrastive loss function, which aims to minimize the distance between debiased representations of positive pairs and maximize the distance from other samples.

4) Train the parameters of the adapter G through contrastive learning, enabling G to filter out social biases from the PLM’s encoding space.

5) After training, the adapter G can be widely applied to any PLM model to remove social biases before downstream tasks.

Highlights:

1) The double data augmentation strategy, by automatically searching for bias prompts, further enhances the bias between positive sample pairs from different demographic groups, breaking the limitations of relying on human prior knowledge.

2) The debiasing adapter does not need to access the internal structure and parameters of the PLM; it only trains lightweight adapter parameters to complete debiasing, saving substantial computational resources without affecting the PLM’s language modeling capabilities.

3) On multiple real-world datasets and evaluation metrics, both gender and racial debiasing achieve excellent and stable performance, demonstrating strong generalization abilities.

This paper discusses the challenge of racial bias in PLMs, pointing out that the current sensitive attribute words cannot fully cover racial biases, leading to most existing methods focusing on gender bias and struggling to generalize to other social biases. The authors’ proposed debiasing strategy alleviates the reliance on human expertise to some extent, providing new insights for better addressing racial biases.