Multi-Modal Interpretable Representation for Non-Coding RNA Classification and Class Annotation
Non-coding RNAs (ncRNAs) play critical roles in cellular processes and disease development. Although genome sequencing projects have revealed a vast number of non-coding genes, the functional classification of ncRNAs remains a complex and challenging issue. The diversity, complexity, and functionality of ncRNAs make them important subjects in biomedical research, particularly in the discovery of biomarkers and therapeutic targets. However, most existing ncRNA classification tools rely on only one or two types of data (e.g., sequence or secondary structure), ignoring other potentially important data sources. Additionally, existing methods often lack interpretability, making it difficult to reveal the characteristics of different ncRNA classes.
To address these issues, a research team from Université Paris-Saclay and Institut Curie proposed a multi-modal deep learning model named MMNC (Multi-Modal Interpretable Representation for Non-Coding RNA Classification and Class Annotation). This model integrates sequence, secondary structure, and expression data to achieve efficient classification of ncRNAs and provides an interpretable attention mechanism that reveals the importance of different modalities in classification.
Source of the Paper
The paper was co-authored by Constance Creux, Farida Zehraoui, François Radvanyi, and Fariza Tahi, affiliated with Université Paris-Saclay and Institut Curie. It was published on January 31, 2025, in the journal Bioinformatics, titled “MMNC: Multi-Modal Interpretable Representation for Non-Coding RNA Classification and Class Annotation.”
Research Process and Details
1. Research Objectives and Method Overview
The core objective of MMNC is to develop a multi-modal deep learning model capable of integrating sequence, secondary structure, and expression data to classify ncRNAs and provide an interpretable attention mechanism. The model employs an intermediate fusion strategy, using attention mechanisms to quantify the contribution of different modalities to classification and handle missing data.
2. Modality Encoding
The MMNC model first independently encodes each modality to extract meaningful information: - Sequence Encoding: Convolutional Neural Networks (CNNs) or Transformer models are used to encode ncRNA sequences. The CNN model includes multiple convolutional blocks, each consisting of a convolutional layer, Leaky ReLU activation, batch normalization, max pooling, and dropout. The Transformer model is based on the pre-trained DNABERT model, extracting sequence features through transfer learning. - Secondary Structure Encoding: RNA secondary structures are represented as graphs and encoded using Graph Neural Networks (GNNs). The GNN model includes multiple graph convolutional blocks, each consisting of a graph convolutional layer, Leaky ReLU activation, batch normalization, and dropout. - Expression Encoding: Multi-Layer Perceptrons (MLPs) are used to encode expression data. The MLP model includes multiple fully connected layers, each with ReLU activation, batch normalization, and dropout.
3. Attention Mechanism and Modality Fusion
After modality encoding, MMNC performs modality fusion using an attention mechanism. The specific steps are as follows: - Modality Projection: Project the representation of each modality into a feature space of the same dimension. - Attention Calculation: Compute the interaction matrix between modalities through cross-attention mechanisms and generate attention coefficients to quantify the importance of each modality. - Handling Missing Data: Use a mask mechanism to ignore attention coefficients of missing modalities, ensuring that all available data is utilized.
4. Classification Task
The fused modality representations are used for classification tasks. MMNC employs a multi-layer fully connected network for final classification and uses the cross-entropy loss function for training.
Main Results
1. Comparison of Modality Encoders
The research team compared the performance of different modality encoders: - Sequence Encoding: The CNN2 model performed best across all three datasets, with accuracies of 0.951, 0.980, and 0.966, respectively. - Secondary Structure Encoding: The GNN model based on SAGE convolutions performed best across the three datasets, with accuracies of 0.797, 0.831, and 0.944, respectively. - Expression Encoding: The MLP1 model performed best on dataset D3, achieving an accuracy of 0.790.
2. Ablation Study on Modality Contributions
Through ablation studies, the research team found: - Single-Modality Performance: The sequence modality achieved the highest classification accuracy, followed by secondary structure and expression modalities. - Multi-Modality Performance: Multi-modal combinations significantly improved classification performance. For example, on dataset D3, the three-modality combination achieved an accuracy of 0.982, significantly higher than single- or dual-modality combinations.
3. Interpretability of the Attention Mechanism
The attention mechanism provides interpretability for classification results. For example, on dataset D3: - lncRNA: The expression modality was the primary contributor to classification, reflecting the tissue-specific expression patterns of lncRNAs. - miRNA: The sequence modality was the primary contributor to classification, reflecting the specific sequence patterns of miRNA precursors. - snoRNA: Both sequence and expression modalities contributed significantly to classification, reflecting the conserved sequences and expression features of the snoRNA family.
4. Comparison with Existing Methods
MMNC outperformed existing ncRNA classification tools across all three datasets. For example, on dataset D1, MMNC achieved an accuracy of 0.953, significantly higher than other tools (e.g., ncRNA-Deep at 0.914 and RNagcn at 0.851).
Conclusions and Significance
MMNC proposes a novel multi-modal deep learning framework capable of efficiently classifying ncRNAs and providing an interpretable attention mechanism that reveals the importance of different modalities in classification. The model has the following scientific and application value: - Scientific Value: By integrating multi-modal data, MMNC can provide a more comprehensive description of ncRNA characteristics, advancing our understanding of ncRNA functions. - Application Value: MMNC’s high classification performance and interpretability make it widely applicable in biomarker discovery and disease mechanism research.
Research Highlights
- Multi-Modal Integration: MMNC is the first to integrate sequence, secondary structure, and expression data, providing a richer description of ncRNAs.
- Interpretability: Through the attention mechanism, MMNC reveals the contribution of different modalities to classification, enhancing model interpretability.
- Handling Missing Data: MMNC effectively handles missing data, ensuring that all available information is utilized.
Future Directions
The research team plans to expand the scope of MMNC to explore inter-class similarities and the discovery of novel ncRNA classes, further advancing the ncRNA classification framework.