Knowledge-Enhanced Graph Topic Transformer for Explainable Biomedical Text Summarization

Application of Knowledge-Enhanced Graph Topic Transformer in Interpretable Biomedical Text Summarization

Research Background

Due to the continuous increase in the volume of biomedical literature, the task of automatic biomedical text summarization has become increasingly important. In 2021 alone, 1,767,637 articles were published in the PubMed database. Current summarization methods based on pre-trained language models (PLMs) have improved summarization performance but exhibit significant limitations in capturing domain-specific knowledge and result interpretability. This may lead to generated summaries that lack coherence, including redundant sentences or omissions of important domain knowledge. Additionally, the black-box nature of transformer models makes it difficult for users to understand the reasons and methods behind the generated summaries. Therefore, incorporating domain-specific knowledge and interpretability in biomedical text summarization is crucial for enhancing accuracy and transparency.

Research Source

This paper was authored by Qianqian Xie, Prayag Tiwari (IEEE Senior Member), and Sophia Ananiadou, belonging to the Department of Computer Science at the University of Manchester, the School of Information Technology at Halmstad University, and the National Centre for Text Mining at Manchester, respectively. The study was published in the IEEE Journal of Biomedical and Health Informatics, April 2024, Volume 28, Issue 4.

Research Content

Methodology

This paper proposes a new Domain Knowledge-Enhanced Graph Topic Transformer (DORIS) for interpretable biomedical text summarization. The DORIS model integrates a Graph Neural Topic Model and Unified Medical Language System (UMLS) knowledge into a transformer-based pre-trained language model.

a) Research Procedure Details

The research includes the following steps: 1. Knowledge-Enhanced Encoder: Encoding input documents and summaries using PLMs such as BERT to obtain sentence context representations. Introducing Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT) for modeling semantic relationships of words and sentences. 2. Graph Construction: Utilizing UMLS to generate word association graphs and sentence association graphs, acquiring biomedical entity similarities through SapBERT. 3. Topic Representation Generation: Generating topic word distributions from word association graphs using GCN. Enriching sentence representations based on sentence association graphs using GAT. 4. Domain Knowledge Integration: During summary extraction and topic inference, integrating topic representations of documents and sentences. Finally, an important sentence classifier selects key sentences to form the summary.

b) Main Outcomes

The study demonstrated that this method outperforms existing state-of-the-art PLM-based summarization methods across four biomedical literature datasets. Specifically, DORIS can utilize the Graph Neural Topic Model in the summarization process, providing model interpretability by allowing users to understand why the model selects certain sentences. Moreover, the introduction of domain-specific knowledge enables the model to more effectively identify and generate coherent topics, thereby improving summary quality.

Datasets and Experiments

The experiments used four different biomedical literature datasets, including CORD-19, PubMed-Long, PubMed-Short, and S2ORC, assessing summary quality by comparing ROUGE scores between generated summaries and reference summaries. The experiments also evaluated the model’s parameter sensitivity and interpretability by calculating topic consistency and sentence-related topic words, validating the model’s interpretability.

Results Analysis

The research results clearly indicate that the DORIS method significantly outperforms existing methods in generating interpretable and accurate biomedical literature summaries. Integrating domain knowledge into graphical neural networks allows the model to better understand and distinguish specific information in the biomedical field, generating more coherent and domain-relevant topics.

Conclusion and Significance

The proposal of the DORIS method not only enhances the accuracy and coherence of biomedical text summarization but also achieves summary result interpretability by integrating domain-specific knowledge and Graph Neural Topic Models. This is crucial for users (such as clinicians) to understand and trust machine-generated summaries. Future research directions include applying this interpretability framework to abstract summarization and multi-document summarization tasks in biomedical texts, as well as extending it to clinical note datasets.