Accelerating Ionizable Lipid Discovery for mRNA Delivery Using Machine Learning and Combinatorial Chemistry

Accelerating the Discovery of Ionizable Lipids for mRNA Delivery using Machine Learning and Combinatorial Chemistry

Research Background

To fully realize the potential of mRNA therapies, it is essential to expand the toolkit of lipid nanoparticles (LNPs). However, a key bottleneck in LNP development is identifying new ionizable lipids. Previous studies have demonstrated that LNPs show significant results in delivering mRNA to specific tissues or cells. Classic LNP formulations typically consist of an ionizable lipid, cholesterol, helper lipids, and polyethylene glycol-lipid (PEG-lipid), wherein the ionizable lipid plays a crucial role in mRNA loading and endosomal escape.

In recent years, LNPs have made significant strides in clinical applications. For example, the U.S. Food and Drug Administration (FDA) approved the first small interfering RNA (siRNA) drug, Onpattro, for hereditary amyloidosis, and two SARS-CoV-2 vaccines jointly developed by Moderna and Pfizer/BioNTech. Nonetheless, each FDA-approved LNP formulation contains a unique ionizable lipid. Besides the reliance on conventional chemical reaction platforms, accelerating the discovery of new mRNA delivery lipids still faces significant challenges.

Paper Information

The research paper, titled “Accelerating the Discovery of Ionizable Lipids for mRNA Delivery using Machine Learning and Combinatorial Chemistry,” was authored by Bowen Li, Idris O. Raji, Akiva G. R. Gordon, and others, affiliated with institutions such as MIT, Boston Children’s Hospital, University of Michigan, and University of Toronto. The paper was published in “Nature Materials” in March 2024.

Research Process

Research Steps

  1. Creation of Chemically Diverse Library: The research began with a simple four-component reaction platform, creating a library of 584 chemically diverse ionizable lipids. These lipids were synthesized through a four-component reaction (4CR) system, involving amines, isocyanides, aldehydes, and carboxylic acids as reactants.

  2. Screening LNPs and Building Base Dataset: Initially, the mRNA transfection efficiency of these lipids within LNPs was screened, and the data served as the foundational dataset for training various machine learning models.

  3. Training and Selection of Machine Learning Models: Using the mRNA transfection results of 584 lipids, three nonlinear machine learning (ML) algorithms were trained: Random Forest, Logistic Regression, and Gradient Boosting, with the XGBoost algorithm performing the best. Techniques like random partitioning and over-sampling were applied to mitigate potential biases in ML algorithms, and molecular descriptors generated by PaDEL-Descriptor software represented the chemical structures of each lipid.

  4. Screening Virtual Lipid Libraries and Experimental Validation: The best-performing model was used to explore a virtual library containing 40,000 lipids, and the top 16 lipids from this selection were synthesized and experimentally validated.

  5. Discovery and Performance Evaluation of Novel Lipid 119-23: A lipid numbered 119-23 was successfully identified, showing superior transfection efficiency into muscle and immune cells across various tissues compared to established benchmark lipids.

Methods and Experimental Details

  1. Selection of Chemical Components and Reaction Design: The research team combined three amine variants (head groups), four isocyanide variants (linker groups), eight aldehyde variants (tail group 1), and four carboxylic acid variants (tail group 2) using the 4CR system, synthesizing 384 chemically diverse ionizable lipids.

  2. Specific Application of Machine Learning Algorithms: The machine learning model utilized 2,014 generated molecular descriptors to predict lipid performance in mRNA delivery. The XGBoost algorithm excelled in the receiver operating characteristic curve (ROC-AUC) and precision-recall curve (PR-AUC), ultimately chosen as the predictive model.

  3. Experimental Validation and Optimization: Sixteen new lipids were synthesized and tested for transfection efficiency in mice through intramuscular injection. Lipid 119-23 showed significant transfection efficiency, especially in muscle and various immune cells, outperforming control lipids.

Research Results

Main Results

  1. Preliminary Screening and Dataset Construction: Through high-throughput screening, the research team obtained mRNA transfection data for 584 ionizable lipids both in HeLa cells and in mice, constructing the base dataset.

  2. Selection of Best Model and Large-scale Screening: The XGBoost model outperformed others and was used to screen a virtual library of 40,000 lipids, identifying the top 16 promising lipids for further experimental validation.

  3. Discovery and Validation of Lipid 119-23: Lipid 119-23 demonstrated transfection efficiency superior to benchmark lipids across various tissues, especially in muscle and immune cells.

Research Conclusions

Conclusions and Significance

By combining machine learning with 4CR chemical reactions, the research team developed a rapid and efficient screening method for ionizable lipids, significantly shortening the discovery cycle for new mRNA delivery lipids. The remarkable performance of lipid 119-23 enhanced mRNA transfection efficiency across various cell types, demonstrating broad application potential.

Highlights and Novelty

  1. Innovative Combinatorial Chemistry Platform: Using a 4CR platform for high-throughput screening of ionizable lipids, compared to three-component reactions, effectively increased synthesis efficiency and yield.
  2. Application of Machine Learning in Molecular Screening: The integration of machine learning technology improved the efficiency of large-scale compound library screening. The XGBoost model performed best in the screening process.
  3. Discovery of Novel Lipid 119-23: Demonstrated significantly improved mRNA delivery effects across various tissues compared to commercial benchmark lipids, showcasing its potential in vaccines and therapeutic protein replacement therapies.

Additional Information

The research also delved into the specific roles of lipid components in mRNA delivery, elucidating the relationship between lipid composition and transfection efficiency through molecular descriptors. These findings not only enriched the chemical library of ionizable lipids but also provided new tools and methods for future mRNA therapies. By innovatively combining machine learning with combinatorial chemistry, this study opened a new pathway for accelerating the development of mRNA delivery lipids and broader prospects for mRNA therapies.