Predicting Drug-Target Affinity by Learning Protein Knowledge from Biological Networks

Predicting Drug-Target Affinity Based on Learning Protein Knowledge from Biological Networks

Background

The prediction of drug-target affinity (DTA) plays a crucial role in drug discovery. Efficient and accurate DTA prediction can significantly reduce the time and economic costs of new drug development. In recent years, the explosive development of deep learning technology has provided strong support for DTA prediction. Existing DTA prediction methods mainly fall into two categories: methods based on 1D protein sequences and methods based on 2D protein structure graphs. However, these methods only focus on the intrinsic properties of target proteins, ignoring the extensive prior knowledge of protein interactions revealed in previous studies.

To address the aforementioned issue, this paper proposes an end-to-end DTA prediction method named MSF-DTA (multi-source feature fusion-based drug-target affinity). MSF-DTA enhances protein representation by using information from adjacent proteins and employs an advanced graph pre-training framework, VG-AE (Variational Graph Autoencoder), to learn these representations, making the prediction results more accurate and efficient.

Source

This research was authored by Wenjian Ma, Shugang Zhang, Zhen Li, Mingjian Jiang, Shuang Wang, Nianfan Guo, Yuanfei Li, Xiangpeng Bi, Huasen Jiang, and Zhiqiang Wei, hailing from various renowned institutions including the Qingdao campus of Ocean University of China, Qingdao University, and China University of Petroleum (East China). The paper was published in the IEEE Journal of Biomedical and Health Informatics in April 2023.

Research Details

Workflow

  1. Data Collection and Network Construction: Collected 18,552 human proteins from the SwissProt database and constructed protein-protein interaction (PPI) networks and sequence similarity networks (SSN) based on known protein interaction data.

  2. Protein Feature Representation: Included sequence encoding, subcellular localization, and protein domains, pre-processing the proteins with these features to form a 2,045-dimensional feature vector.

  3. Variational Graph Autoencoder (VG-AE): Utilized the VG-AE framework to achieve multi-source feature fusion for protein features in PPI and SSN networks. The high-dimensional features are compressed into low-dimensional latent representations (μ) using a Graph Convolutional Network (GCN) encoder, and the input graph data is reconstructed using an inner product decoder.

  4. DTA Prediction: Used the low-dimensional latent representations for DTA prediction. Specifically, merged the protein features with drug features extracted using a 3-layer GCN, and the results were output through multiple fully connected layers.

Main Results

  1. Model Performance: MSF-DTA achieved excellent performance on two widely used DTA prediction benchmark datasets, Davis and KIBA. The MSE on the Davis dataset was 0.194 with CI of 0.906; on the KIBA dataset, the MSE was 0.124 with CI of 0.897. Experimental results demonstrated that MSF-DTA significantly outperforms existing state-of-the-art DTA prediction methods.

  2. Effectiveness of Protein Neighbor Features: By incorporating neighbor protein information from PPI and SSN networks, MSF-DTA effectively enhances protein representation and improves model prediction performance.

  3. Broad Applicability: This method also excelled in compound-protein interaction (CPI) prediction tasks, proving its generalization ability across different tasks.

Conclusion and Significance

The MSF-DTA method proposed in this study, through the fusion of multi-source features of proteins, significantly improves the accuracy and efficiency of DTA prediction, providing a precise and efficient tool. The method not only demonstrated the effectiveness of high-level protein features as a new approach for protein representation but also proved the feasibility of using neighbor protein features from PPI and SSN networks in predicting drug-protein interactions or affinity.

Research Highlights

  1. Introduction of Multi-Source Features: MSF-DTA combines intrinsic protein attributes and prior biological knowledge from PPI and SSN networks, providing a new perspective for DTA prediction tasks.

  2. Application of Advanced Graph Pre-Training Framework VG-AE: By utilizing the VG-AE framework, the model can better capture the topological connections between proteins, enriching protein representation.

  3. Excellent Experimental Results: MSF-DTA outperformed existing state-of-the-art methods in both DTA and CPI prediction tasks.

Conclusion

This paper proposes a new multi-source feature fusion-based drug-target affinity prediction method, MSF-DTA. By utilizing neighbor protein information from protein-protein interaction and sequence similarity networks, it significantly improves the accuracy and efficiency of DTA prediction. This study not only demonstrated the effectiveness of high-level protein features as a new approach for protein representation but also proved the feasibility of using neighbor protein features from PPI and SSN networks. This provides a new solution for drug discovery and efficient DTA prediction.