Equivariant 3D Conditional Diffusion Model for Molecular Linker Design

From early drug discovery researchers face a daunting challenge – to find drug-like candidate molecules among approximately 10^60 possible molecular structures. One successful solution is to start from smaller “fragment” molecules, a strategy known as fragment-based drug design (FBDD). In the FBDD process, the first step is to computationally screen for fragments that bind to the target protein pocket, and then link these fragments into a single compound. When connecting fragments, their geometric conformations and the structure of the protein pocket need to be considered to design potential drug molecules with high binding affinity.

The generative process of molecular linker design

This paper introduces a new molecular linker design method called DiffLinker. It is a 3D Equivariant Diffusion model that can generate linker structures connecting any number of disconnected fragments. Unlike previous autoregressive methods, DiffLinker can generate linkers connecting two or more fragments at once, without pre-determining the number of linker atoms or connection sites. Furthermore, the model can incorporate target protein pocket information to generate molecules that effectively bind to it.

The paper is authored by Ilia Igashov et al., with the first author from École Polytechnique Fédérale de Lausanne, and collaborators from MIT, Microsoft Research AI, and the University of Oxford. The research was published in April 2024 in the journal Nature Machine Intelligence.

The researchers first trained and evaluated the DiffLinker model on public datasets such as ZINC and CASF. Experimental results showed that compared to other linker design methods, the molecules generated by DiffLinker have higher synthetic accessibility, better drug-likeness, higher chemical diversity, and better reproduce reference molecular structures.

Next, the researchers proposed a new and more challenging Geom dataset, where each example contains 3 or more fragments to be connected. On this dataset, DiffLinker demonstrated outstanding performance, with 93% of the generated molecules being valid, while other methods could hardly generate valid ones.

In the conditional setting, DiffLinker incorporated the target protein pocket’s atomic cloud to generate linker structures compatible with the protein pocket’s conformation. Compared to methods that only generate molecules based on fragments, the molecules generated by DiffLinker under pocket constraints exhibited fewer strained conformations and higher predicted binding affinities.

Finally, the researchers illustrated the practical application of DiffLinker in drug design through three real case studies (design of HSP90 inhibitors, IMPDH inhibitors, and JNK inhibitors). The results showed that DiffLinker could successfully reproduce known active molecules reported in the literature and generate new molecules with better physicochemical properties and binding affinities.

This research provides a new and efficient tool for fragment-based drug design. DiffLinker can design structures connecting different fragments more quickly and accurately, automatically determining the number of linker atoms and connection sites, avoiding the complex process of manual design. Moreover, by incorporating protein pocket information for conditional generation, DiffLinker can provide more informative candidate molecules for the FBDD strategy. Overall, DiffLinker shows great application prospects and is expected to accelerate the discovery and development of new drugs.