Determining Structures of RNA Conformers Using AFM and Deep Neural Networks

Academic Background

RNA (ribonucleic acid) is a crucial molecule in living organisms, involved in various biological processes such as gene expression, regulation, and catalysis. Although a significant portion of the human genome is transcribed into RNA, the structural study of RNA molecules remains a major challenge. RNA molecules typically exhibit high conformational heterogeneity and flexibility, which are prerequisites for their function but also limit the applicability of traditional structural determination methods such as nuclear magnetic resonance (NMR), X-ray crystallography, and cryo-electron microscopy (cryo-EM). Particularly for large RNA molecules, due to their conformational diversity and the lack of a large-scale RNA structure database, existing protein structure prediction methods (e.g., AlphaFold) cannot be directly applied to RNA. Therefore, accurately resolving the three-dimensional structures of large RNA molecules, especially their conformational heterogeneity, has become a significant challenge in RNA structural biology.

Source of the Paper

This paper was co-authored by Maximilia F. S. Degenhardt, Hermann F. Degenhardt, Yuba R. Bhandari, and other scientists from multiple research institutions, including the National Cancer Institute (NCI) and the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). The paper was published in Nature in 2024, titled Determining structures of RNA conformers using AFM and deep neural networks.

Research Process

1. Research Objectives and Method Overview

The study proposes a novel method called HORNET, which combines atomic force microscopy (AFM), unsupervised machine learning, and deep neural networks (DNN) to resolve the three-dimensional topological structures of RNA molecules. The core of the HORNET method lies in using AFM images to capture high-resolution topological information of individual RNA molecules in solution and analyzing this information through machine learning and deep learning algorithms to reconstruct the three-dimensional structures of RNA.

2. Experimental Process

a) AFM Image Acquisition and Processing

The study first used AFM to image RNA molecules, obtaining high-resolution topological images of individual RNA molecules. The advantage of AFM lies in its high signal-to-noise ratio, enabling the capture of structural features of large RNA molecules in different conformations. The researchers performed noise estimation and resolution analysis on the AFM images to ensure their quality for subsequent structural reconstruction.

b) Dynamic Fitting and Model Generation

The researchers used coarse-grained molecular dynamics simulations to perform dynamic fitting of RNA molecules, generating a large number of conformational models. These models were constrained by the topological information from AFM images to ensure consistency with experimental data. During the dynamic fitting process, the researchers introduced AFM pseudo-potentials and classical Gibbs free energy descriptions to drive the models toward convergence with the experimental data.

c) Unsupervised Machine Learning (UML) and Model Screening

Based on the models generated from dynamic fitting, the researchers used unsupervised machine learning (UML) to cluster and screen these models. The UML algorithm combined energy information, AFM topological information, and the hierarchical folding principle of RNA to select models that best matched the experimental data. Through principal component analysis (PCA) and clustering algorithms, the researchers screened out models with the lowest energy and best fit to the AFM images from a large number of models.

d) Deep Neural Network (DNN) and Accuracy Estimation

To further improve the accuracy of the models, the researchers developed a deep neural network (DNN) to estimate the root-mean-square deviation (RMSD) between each model and the true structure. The DNN was trained on a database (PSDatabase) containing 3.5 million RNA structural models, enabling accurate prediction of model accuracy. The training and validation process of the DNN demonstrated that the method could effectively estimate the accuracy of unknown RNA structures, especially within the range of RMSD less than 7 Å.

e) Validation and Application

The researchers applied the HORNET method to multiple RNA molecules, including RNase P RNA and the HIV-1 Rev response element (RRE) RNA. Through AFM images and the HORNET method, the researchers successfully resolved the structures of multiple conformers of these RNA molecules, demonstrating the method’s powerful capability in resolving conformational heterogeneity of large RNA molecules.

Main Results

  1. AFM Images and Structural Reconstruction: The researchers successfully captured multiple conformations of RNase P RNA and HIV-1 RRE RNA using AFM and reconstructed the three-dimensional structures of these RNAs using the HORNET method. The reconstructed structures had RMSDs of 3-6 Å compared to known crystal structures, indicating that the HORNET method can accurately resolve the topological structures of RNA.

  2. Effectiveness of Unsupervised Machine Learning: Through unsupervised machine learning, the researchers screened out models that best matched the experimental data from a large number of dynamically fitted models. These models had RMSDs around 5 Å, demonstrating that the UML algorithm can effectively screen high-quality structural models.

  3. Accuracy Estimation by Deep Neural Network: The DNN could accurately estimate the RMSD between each model and the true structure, especially within the range of RMSD less than 7 Å. The DNN’s predictions were highly consistent with the models screened by UML, further validating the reliability of the HORNET method.

  4. Structural Resolution of HIV-1 RRE RNA: The researchers successfully resolved multiple conformations of HIV-1 RRE RNA and found significant variations in the distances between Rev protein binding sites in these conformations. This discovery provides new insights into how the HIV-1 virus specifically recognizes RRE RNA.

Conclusions and Significance

The HORNET method, by combining AFM, unsupervised machine learning, and deep neural networks, successfully addresses the challenge of resolving conformational heterogeneity in large RNA molecules. This method not only captures high-resolution topological information of RNA molecules but also accurately reconstructs their three-dimensional structures through machine learning algorithms. The introduction of the HORNET method provides a new tool for RNA structural biology research and is expected to accelerate our understanding of the RNA conformational space, with broad application prospects in RNA function and RNA-targeted drug design.

Research Highlights

  1. Innovative Method: The HORNET method is the first to combine AFM, unsupervised machine learning, and deep neural networks for resolving the three-dimensional structures of RNA, filling a gap in the field of RNA structural determination.
  2. High-Accuracy Structural Reconstruction: Using the HORNET method, researchers can reconstruct the topological structures of RNA with high accuracy, with RMSDs in the range of 3-6 Å, demonstrating the method’s powerful capability in RNA structural resolution.
  3. Broad Application Prospects: The HORNET method is not only applicable to the structural resolution of known RNA molecules but can also be used for the structural prediction of unknown RNA molecules, providing a new tool for RNA function research and drug design.

Other Valuable Information

The researchers also designed a novel branched peptide that mimics the dimer structure of the HIV-1 Rev protein and demonstrated its high specificity in binding to RRE RNA. This discovery provides new insights for the development of novel anti-HIV drugs.