Biomedical Relation Extraction with Knowledge Graph-Based Recommendations

Research Report on the Integration of Medical Relation Extraction and Knowledge Graph-Based Recommendations

Background Introduction

In the medical field, the explosive growth of literature makes it challenging for researchers to keep up with the latest advancements in their specific areas. From the perspective of natural language processing (NLP), continuously evolving automated tools can help identify and extract relevant information from unstructured texts, a task known as Relation Extraction (RE). The primary goal of RE is to extract and categorize relationships between medical entities from the text, enhancing our understanding of biomedical processes.

Currently, most cutting-edge medical RE systems use deep learning methods, mainly targeting relationships between entities of the same type, such as genes and drugs. However, these systems largely focus on information directly extracted from the text, neglecting specialized knowledge bases such as ontologies, which are often structured as Directed Acyclic Graphs (DAG).

On the other hand, Knowledge Graph (KG)-based recommendation systems have shown the importance of integrating additional features from KGs into item information to improve recommendation effectiveness. Typically, the users of these systems are humans, and recommended items include movies, books, etc. This work proposes integrating KG-based recommendation models into medical RE to further extend its application range.

Source Introduction

The article titled “Biomedical Relation Extraction with Knowledge Graph-Based Recommendations” was written by Diana Sousa and Francisco M. Couto, researchers affiliated with Lasige at the University of Lisbon, Portugal (Universidade de Lisboa). This article was published in the IEEE Journal of Biomedical and Health Informatics, Volume 26, Issue 8, in August 2022.

Research Process

This report describes a single original study that investigates how medical RE systems can integrate KG-based recommendation systems.

Research Process

Dataset Preparation

The study first converts three publicly available RE datasets into a format compatible with KG recommendation systems: user-item-rating triplets. The selected datasets include PGR-Crowd (relationships between human phenotypes and genes), DDI Corpus (relationships between drugs/chemicals), and BC5CDR Corpus (interactions between drugs/chemicals and diseases).

In the PGR-Crowd dataset, the users are genes, and the items are human phenotypes; in the BC5CDR dataset, the users are drugs/chemicals, and the items are diseases; for the DDI Corpus, given that the relationships involve entities of the same type, the user and item roles were assigned by examining relationship symmetry. Each user-item pair is assigned a rating, with 1 indicating a true relationship and -1 indicating a false relationship.

Model Training

  1. Deep Learning Model Biont: This model uses external knowledge sources (such as ontologies) as an information layer to enhance standard deep learning models. It is trained using the stochastic gradient descent optimization algorithm, calculates the loss function, and adjusts weights. Key hyperparameters include learning rate, categorical cross-entropy as the loss function, and dropout rates for each layer.

  2. Knowledge Graph-Based Recommendation Model TUP: This model outputs a relevance score for a user-item pair, indicating whether the user likes the item. It uses a soft preference strategy combining multiple preferences, with the attention mechanism bringing recommendations from the knowledge graph. It is optimized using the BPR loss function.

  3. K-Biont Combined Model: This combines the Biont and TUP models, using the deep learning model to extract relationships and the recommendation model to provide additional support information. By analyzing the confidence matrix, the labels from the recommendation module are considered, especially when the deep learning model’s label is false, and the recommendation module’s label is true.

Research Results

Deep Learning Model

Applying the three datasets (PGR-Crowd, DDI Corpus, BC5CDR Corpus) to the Biont and BioBERT deep learning models showed that Biont and BioBERT performed similarly on the PGR-Crowd dataset. On the DDI Corpus dataset, BioBERT outperformed Biont, possibly due to compatibility issues with the Chebi ontology in Biont.

Knowledge Graph Recommendation Model

The adjusted TUP model used a soft recommendation strategy. Despite the data sparsity issue, it showed good recommendation performance on the PGR-Crowd dataset. Since 100% of the item entities in the PGR-Crowd dataset are linked to the HPO ontology, the recommendation module demonstrated significant improvement.

Combined Model Evaluation

Comprehensive evaluation indicated that the performance of the K-Biont combined model improved with an increasing number of recommendations, particularly on the PGR-Crowd and BC5CDR datasets. The DDI Corpus did not show noticeable improvement, suggesting limited effectiveness of the recommendation module due to low coverage of item entities in this context.

Research Conclusions and Applications

This study provides a new direction by integrating deep learning models with KG-based recommendation systems. By incorporating existing KG knowledge into medical RE, the models’ ability to identify rare relationships is enhanced. Although KG coverage is currently a limiting factor, the recommendation module still improved the performance of RE systems.

The study demonstrated that, in scenarios with sufficient ontology coverage, KG recommendations can effectively supplement deep learning models by discovering true relationships missed by the deep learning models.

Research Highlights

  • Innovation: This is the first time that a KG-based recommendation system has been integrated into medical RE, showcasing a new method to enhance RE systems.
  • Practicality: The recommendation system can supplement deep learning models, especially in knowledge-sparse areas.
  • Broad Prospects: Future research can expand to encompass more types of relationships and various ontologies, enhancing the system’s widespread applicability.

Other Valuable Information

Future research can extend KG integration to multiple types of relationships and explore more biomedical ontologies to enhance KG coverage. Also, combining KG completion techniques to improve recommendation reliability would bring additional value to medical RE systems.

References

The article includes a rich array of references covering knowledge graphs, deep learning, medical information processing, and other fields, providing a solid theoretical foundation for the research. Noteworthy sections include:

  • The latest developments in medical ontologies such as HPO, Chebi, DO, etc.
  • Developments in KG-based recommendation systems and their application in the biomedical field.
  • The latest research findings on deep learning methods in relation extraction.

Through this research, it is foreseeable that the integration of knowledge graphs and deep learning will become a new breakthrough in future medical information processing, further improving the accuracy and comprehensiveness of information extraction.