FlowPacker: Protein Side-Chain Packing with Torsional Flow Matching

The three-dimensional structure of a protein is determined by its amino acid sequence, and the function of the protein is highly dependent on its three-dimensional structure. The side-chain conformations of proteins play a crucial role in protein folding, protein-protein interactions, and de novo protein design. Accurate prediction of protein side-chain conformations is key to understanding protein folding mechanisms, designing novel proteins, and studying protein interactions. However, traditional physics-based modeling, which relies on empirical scoring functions, discrete rotamer libraries, and Markov Chain Monte Carlo (MCMC) sampling, often struggles to achieve ideal results due to inefficient search algorithms and the inaccuracy of scoring functions.

In recent years, artificial intelligence has made significant progress in protein structure prediction and design. Deep learning models, such as AlphaFold and DiffPack, have demonstrated superior performance in the task of protein side-chain packing. Nevertheless, existing methods still have room for improvement in terms of runtime and accuracy. To address this, Jin Sub Lee and Philip M. Kim developed FlowPacker, a model based on torsional flow matching and equivariant graph attention networks, aiming to enhance the accuracy and efficiency of protein side-chain conformation prediction.

Source of the Paper

This paper was co-authored by Jin Sub Lee and Philip M. Kim, affiliated with the Department of Molecular Genetics and the Department of Computer Science at the University of Toronto, Canada. The paper was published on January 9, 2025, in the journal Bioinformatics, titled “FlowPacker: Protein Side-Chain Packing with Torsional Flow Matching.” The code and data for the paper have been made publicly available on GitLab for use by academia and industry.

Research Process

1. Model Design

The core of FlowPacker is torsional flow matching and equivariant graph attention networks. Torsional flow matching is a novel generative modeling paradigm that enables the training of continuous normalizing flows (CNFs) in a simulation-free manner, offering stronger performance and faster training convergence compared to traditional diffusion models. FlowPacker defines a torsional flow matching framework on a high-dimensional torus to generate protein side-chain conformations.

2. Dataset Preparation

The study used two datasets for training: the BC40 dataset and the PDB-S40 dataset. The BC40 dataset contains PDB structures with 40% sequence similarity, while the PDB-S40 dataset consists of monomeric protein structures extracted from a PDB snapshot dated July 28, 2023, clustered at 40% sequence similarity. The test set included target protein structures from CASP13, CASP14, and CASP15.

3. Model Training

FlowPacker’s model architecture is based on EquiformerV2, with a maximum angular momentum (lmax) of 3, a channel dimension of 256, and a total of 18 million trainable parameters. The model was trained on four NVIDIA A100 GPUs for 300 epochs, with a total training time of approximately six days. During training, the model optimized the loss function by predicting the conditional vector field, ultimately generating protein side-chain conformations.

4. Inference Strategy

During the inference phase, FlowPacker used an exponential schedule and an Euler solver to generate side-chain conformations. A confidence model was also developed to select the sample with the lowest predicted error.

Main Results

1. Performance Evaluation

FlowPacker outperformed other baseline models, including the physics-based Rosetta and deep learning-based AttnPacker and DiffPack, on the CASP13, CASP14, and CASP15 test sets. FlowPacker achieved the best results in metrics such as angle mean absolute error (angle MAE), angle accuracy, and atom root-mean-square deviation (atom RMSD).

2. Side-Chain Inpainting

FlowPacker also demonstrated its ability in partial side-chain inpainting tasks. By randomly masking 5% to 75% of residues, FlowPacker was able to generate accurate side-chain conformations based on the provided structural context, indicating its potential application in protein design.

3. Multimeric Complexes

Although FlowPacker was primarily trained on single-chain proteins, the study also tested its performance on antibody-antigen complexes. The results showed that FlowPacker outperformed Rosetta in both CDRH3 and full variable chain (FV) side-chain packing tasks, demonstrating its ability to extend to the prediction of side chains in multimeric complexes.

Conclusion and Significance

FlowPacker significantly improved the accuracy and efficiency of protein side-chain conformation prediction by introducing torsional flow matching and equivariant graph attention networks. The model not only excelled on single-chain proteins but also handled tasks such as partial side-chain inpainting and multimeric complex prediction, showcasing its broad application potential in protein design and structural biology.

Research Highlights

  1. Novel Torsional Flow Matching Framework: FlowPacker is the first to apply torsional flow matching to protein side-chain packing, providing a more efficient generative modeling approach.
  2. Equivariant Graph Attention Networks: By using EquiformerV2, FlowPacker better captures the symmetries of protein structures, enhancing the model’s expressive power.
  3. Multitask Capability: FlowPacker not only performs well on single-chain proteins but also handles partial side-chain inpainting and multimeric complex prediction tasks, demonstrating its broad application prospects.

Future Directions

The research team proposed several future research directions, including improving the prediction of mutational effects using unsupervised or supervised learning, aligning generative models with preference data to enhance biophysical plausibility, and exploring new representations of side-chain conformations. Additionally, the performance of FlowPacker could be further improved through autoregressive sampling and uncertainty analysis.

FlowPacker provides an efficient and accurate solution for protein side-chain packing tasks, laying a solid foundation for future protein design and structural biology research.