Predicting circRNA–Disease Associations with Shared Units and Multi-Channel Attention Mechanisms
Background Introduction
In recent years, circular RNAs (circRNAs), as a novel class of non-coding RNA molecules, have played a significant role in the occurrence, development, and treatment of diseases. Due to their unique circular structure, circRNAs are resistant to degradation by nucleases, making them potential biomarkers and therapeutic targets. However, studying the associations between circRNAs and diseases through experimental methods is both time-consuming and costly, which limits the progress of related research. To address this issue, researchers have begun developing computational models to predict circRNA-disease associations using bioinformatics approaches, thereby providing guidance for experimental studies.
Although multi-view learning methods have been widely applied in predicting circRNA-disease associations, existing approaches often fail to fully utilize latent information across different views and overlook the varying importance of different views in prediction outcomes. To address this, a team from Harbin Institute of Technology and the University of Electronic Science and Technology of China proposed a novel method called MSMCDA (Multi-view Shared Units and Multi-channel Attention Mechanisms for circRNA-Disease Association Prediction), which combines shared units and multi-channel attention mechanisms to more efficiently predict circRNA-disease associations.
Source of the Paper
This research was conducted by Xue Zhang and Chunyu Wang from the School of Computer Science and Technology at Harbin Institute of Technology, and Quan Zou and Mengting Niu from the Institute of Fundamental and Frontier Sciences at the University of Electronic Science and Technology of China. The paper was published in 2025 in the journal Bioinformatics under the title “Predicting circRNA–disease associations with shared units and multi-channel attention mechanisms.” The source code and data have been made publicly available on GitHub for other researchers to use and improve.
Research Process and Results
1. Dataset Construction
The study utilized five publicly available circRNA-disease association datasets: circR2Disease, circR2Diseasev2.0, circRNADisease, circ2Disease, and circRDS. These datasets contain hundreds to thousands of validated circRNA-disease associations. To construct similarity networks, the study employed semantic similarity and Gaussian Interaction Profile (GIP) similarity to measure disease associations, while functional similarity and GIP similarity were used to measure circRNA associations. Additionally, meta-path networks were constructed to capture structural information between circRNAs and diseases.
2. Feature Extraction and Shared Unit Design
The study utilized Graph Convolutional Networks (GCNs) to extract features from similarity networks and meta-path networks. To enhance information interaction across different views, a shared unit was designed, which facilitates feature fusion between similarity views and meta-path views through linear operation modules. The introduction of shared units enables the model to more effectively capture latent information across views, thereby improving prediction accuracy.
3. Multi-Channel Attention Mechanism
To adjust the contribution of different similarity views to the prediction results, a multi-channel attention mechanism was introduced. This mechanism calculates the importance coefficients of each view through global average pooling and fully connected layers and integrates features from multiple similarity views using convolutional neural networks. Experimental results demonstrate that the attention mechanism significantly enhances model performance.
4. Contrastive Learning
The study also adopted a contrastive learning strategy to enhance feature representation by maximizing similarity within positive samples and minimizing similarity between negative samples. The introduction of contrastive learning further improved the model’s ability to capture circRNA-disease associations.
5. Model Training and Evaluation
The model was trained using the Adam optimizer, and its performance was evaluated through five-fold cross-validation. Experimental results show that MSMCDA significantly outperforms other baseline methods in terms of AUC (Area Under the Curve) and AUPR (Area Under the Precision-Recall Curve) across all five datasets. For example, on the circR2Disease dataset, MSMCDA achieved an AUC of 0.976, surpassing the second-best method by 0.022.
6. Case Studies
To validate the model’s effectiveness in practical applications, case studies were conducted on colorectal cancer, gastric cancer, and non-small cell lung cancer. By removing known circRNA-disease associations and retraining the model, the study successfully predicted several new associations, which were validated through literature review. For instance, the predicted association between circ-ZNF609 and colorectal cancer has been experimentally confirmed, demonstrating the value of MSMCDA in discovering novel circRNA-disease associations.
Conclusion and Significance
By introducing shared units and multi-channel attention mechanisms, MSMCDA successfully addresses the shortcomings of existing methods in utilizing multi-view information and adjusting view importance. Experimental results demonstrate that this method has significant advantages in predicting circRNA-disease associations and can provide new biomarkers and targets for disease diagnosis and treatment. Additionally, the open-source implementation of MSMCDA offers a valuable tool for other researchers, facilitating further advancements in circRNA-related studies.
Research Highlights
- Design of Shared Units: Facilitates information interaction between similarity views and meta-path views, significantly enhancing the model’s predictive capabilities.
- Multi-Channel Attention Mechanism: Adaptively adjusts the importance of different views, optimizing the feature integration process.
- Application of Contrastive Learning: Enhances feature representation, further improving model performance.
- Validation Across Multiple Datasets: Experiments on five public datasets demonstrate the robustness and generalizability of MSMCDA.
- Practical Application Value: Case studies validate the model’s utility in discovering novel circRNA-disease associations.
Future Outlook
Although MSMCDA has achieved notable results, the research team also identified its limitations. For instance, the current number of meta-paths is limited, and future work could introduce more meta-paths to capture more comprehensive information. Additionally, integrating more types of biological data (e.g., gene expression data and protein interaction data) could further enhance the model’s predictive performance. The research team plans to explore these directions in future work to further improve the application value of MSMCDA.