Graph Neural Networks with Multiple Prior Knowledge for Multi-omics Data Analysis
Multiple Prior Knowledge Graph Neural Network in Multi-Omics Data Analysis
Background Introduction
Precision medicine is an important field for the future of healthcare as it provides personalized treatment plans for patients, improving treatment outcomes and reducing costs. For instance, due to the complex clinical, pathological, and molecular characteristics of breast cancer patients, the same treatment may exhibit different effects. With the rapid advancement of biomedical technologies, disease characterization can be achieved through multi-omics data. Compared to single-omics approaches, multi-omics methods can capture consistent and complementary information across multiple datasets, thus building more accurate and in-depth models. For example, The Cancer Genome Atlas (TCGA) provides multi-omics data including mRNA expression, DNA methylation, and Copy Number Variation (CNV). Therefore, integrating multi-omics data has become necessary for various tasks in precision medicine, such as drug response prediction, gene discovery, and survival analysis.
Authors and Source
This paper was co-authored by Shunxin Xiao, Huibin Lin, Conghao Wang, and Shiping Wang (Member, IEEE), along with Jagath C. Rajapakse (Fellow, IEEE). Shunxin Xiao is from the School of Computer Science and Engineering at Nanyang Technological University and the School of Computer and Data Science at Fuzhou University. Huibin Lin and Shiping Wang are from the School of Computer and Data Science at Fuzhou University. Conghao Wang and Jagath C. Rajapakse are also from the School of Computer Science and Engineering at Nanyang Technological University. The paper was published in the IEEE Journal of Biomedical and Health Informatics in September 2023.
Research Content
Research Process
This paper proposes a multi-omics data analysis framework based on Graph Neural Networks (GNN), optimized through the integration of multiple prior knowledge. This method includes four main modules:
- Feature-level Learning Module: Aggregates input feature information through prior graphs to generate feature-level embeddings.
- Projection Module: Maximizes consistency across prior networks by optimizing contrastive loss.
- Sample-level Learning Module: Learns global representations through a Multilayer Perceptron (MLP).
- Task-specific Module: Flexibly extends the framework to accommodate various downstream multi-omics analysis tasks.
The experiments validate the effectiveness of this framework in the task of cancer molecular subtype classification.
Main Results
In the cancer molecular subtype classification task, experimental results show that MPK-GNN outperforms other state-of-the-art algorithms across multiple datasets, including multi-view learning methods and multi-omics integration methods. Specifically:
- Feature-level Learning Module: Utilizes Graph Convolutional Network (GCN) to learn feature-level representations from input multi-omics features. The graphs used in the experiments include Gene-Gene Interaction (GGI) network, Protein-Protein Interaction (PPI) network, and Co-expression (Coexp) network.
- Projection Module: Reconstructs the representation of each prior knowledge through a shallow neural network and maximizes the consistency among them.
- Sample-level Learning Module: Learns global representations of each input sample through an MLP.
- Task-specific Module: Connects the feature-level embeddings and sample-level representations, which are then input into the task-specific module to solve tasks such as cancer molecular subtype classification.
Conclusion and Significance
This study proposes a scalable end-to-end deep learning framework (MPK-GNN), introducing a contrastive learning framework into multi-omics data analysis for the first time and utilizing multiple prior knowledge graphs simultaneously. Experimental results demonstrate that MPK-GNN shows significant improvement in the task of cancer molecular subtype classification. This method not only helps to enhance the robustness and performance of computational models, especially in situations with few supervised samples, but also aids in expanding tasks in multi-omics data analysis. Future work includes optimizing the sample-level module to better capture input feature information and validating the superiority of MPK-GNN in more multi-omics data analysis tasks.
Method Highlights
- Innovative Application: Introduces multiple prior graphs simultaneously into multi-omics data analysis for the first time.
- Contrastive Learning Framework: Optimizes learning effects by using multiple prior knowledge graphs through a shared contrastive learning architecture.
- Superior Performance: Achieves competitive results across multiple benchmark datasets, showcasing good robustness.
Other Information
- Datasets: Utilized the TCGA Pan-Cancer dataset and the Breast Cancer dataset BRCA.
- Comparison Models: Included traditional machine learning methods (e.g., SVM, RF, KNN) and the latest deep learning models (e.g., DeepMO, MOGONET, CMSC).
- Experimental Setup: Conducted hyperparameter tuning and multiple experiments to ensure the stability and reliability of the results.
Through the validation and analysis mentioned above, the MPK-GNN framework proposed in this paper demonstrates its great potential in multi-omics data analysis and is expected to provide new perspectives and methods for research and applications in precision medicine.