Molecular Evolution of Protein Sequences and Codon Usage in Monkeypox Viruses

Molecular Evolution of Monkeypox Virus Protein Sequences and Codon Usage

Research Background

The 2022 Monkeypox virus (MPXV) outbreak has caused significant global public health concern. However, the evolutionary mechanisms of MPXV remain not fully understood. MPXV is a linear double-stranded DNA virus belonging to the family Poxviridae, subfamily Chordopoxvirinae, and genus Orthopoxvirus. Its genome is approximately 197 kb and encodes about 200 genes. MPXV can infect various animals, including humans, non-human primates, and rodents. Similar to Variola virus (VARV) and Vaccinia virus (VACV), MPXV can also cause human disease and death.

MPXV was first discovered in 1958 in an animal facility in Denmark and was first isolated from human cases in the Democratic Republic of Congo in 1970. Before 2022, MPXV was mainly prevalent in Central and West African countries, with occasional imported cases in other regions. In May 2022, the UK reported the first case of the 2022 MPXV outbreak, which subsequently led to a global epidemic and was declared a Public Health Emergency of International Concern by the World Health Organization on July 23. As of September 11, 2023, a total of 90,439 confirmed cases have been reported in 115 countries and regions worldwide.

Based on phylogenetic analysis, MPXV is divided into two main lineages: lineage I (“Central African” lineage) and lineage II (“West African” lineage), with the latter further subdivided into IIa and IIb sublineages. Most cases in the 2022 outbreak belong to the IIb sublineage.

Paper Source

This paper was compiled and authored by Shan Kejia, Wu Changcheng, Tang Xiaolu, Lu Ruojian, and Hu Yaling, from the School of Life Sciences at Peking University, Beijing Sinovac Biotech Co., Ltd., and the National Institute for Viral Disease Control and Prevention, Chinese Center for Disease Control and Prevention. The paper was published in Genomics, Proteomics & Bioinformatics in 2024.

Research Methods

To study the evolutionary patterns of MPXV, the research team downloaded 2,789 MPXV genome sequences from NCBI and GISAID databases and conducted the following key research steps:

a) Research Workflow

  1. Gene Sequence Download and Classification: The research team classified the downloaded virus genome sequences into four lineages (I, IIa, IIb-a, IIb-b) for analysis.

  2. Positive Selection Signal Detection: By calculating the DNA base substitution rate (dn/ds) of each gene between lineages, the team found that in most comparisons, the x value was less than 1, indicating that purifying selection is the main driving force in MPXV gene evolution. However, in the comparison between lineage I and lineage II, 12 genes showed median x values greater than 1, with the opg027 gene showing signals of positive selection in all comparisons.

  3. Protein Evolution Acceleration Analysis: Using the Treetime analysis tool, the team estimated the genome evolution rate of 756 IIb sublineage genomes and found that the protein sequence evolution rate of MPXV variants in the 2022 outbreak accelerated.

  4. Epistasis Analysis: Through linkage disequilibrium (LD) analysis of SNPs in the virus genome, the research team found significant epistatic effects between these mutations.

  5. Codon Usage Bias Analysis: Results from the Codon Adaptation Index (CAI) analysis showed that MPXV genes tend to use non-optimized codons compared to human genes. Moreover, the codon optimization level of MPXV decreased over time, but its lethality was negatively correlated with CAI, and it is unclear whether this relationship is coincidental or causal.

b) Main Research Findings

  1. Positive Selection Signals: The opg027 gene showed significant positive selection signals in the comparison between lineage I and lineage II, indicating active adaptive changes during lineage differentiation.

  2. Protein Evolution Acceleration: Analysis of 756 MPXV genomes from the 2022 outbreak demonstrated accelerated evolution over time, especially in non-synonymous mutations.

  3. Epistasis Between Mutations: The study found that many mutations in the MPXV genome exhibit strong epistatic relationships, which may affect the adaptability and evolution rate of virus variants.

  4. Codon Usage Bias: It was discovered that the codon adaptation index of MPXV decreased over time. Lineage I had the most optimized codon usage, while the IIb-b sublineage was the least optimized.

c) Conclusions

This study provides new perspectives on understanding the molecular mechanisms of MPXV transmission and adaptation in hosts through evolutionary analysis of MPXV protein sequences and codon usage. The research points out the potential role of positive selection in lineage differentiation and the important relationship between codon usage bias and virus lethality.

d) Research Highlights

  1. Revealing Positive Selection: The study revealed positive selection signals of opg027 in MPXV lineage differentiation.

  2. Evolutionary Acceleration: The research demonstrated for the first time that MPXV variants from the 2022 outbreak show accelerated evolution in protein sequences.

  3. Epistasis Between Mutations: Significant epistatic relationships were found between mutations in the virus genome, providing new insights for further understanding virus adaptability.

  4. Codon Usage Bias: The study elucidated that MPXV tends to use non-optimized codons and explored the relationship between this tendency and virulence.

Research Significance

This study not only deepens our understanding of virus evolution mechanisms through in-depth analysis of the MPXV genome but also provides new directions for future vaccine development. The in-depth study of positive selection signals in the opg027 gene and the acceleration of protein sequence evolution may help develop more effective control measures.


Through these detailed research steps and results, this paper provides a comprehensive understanding of the molecular evolution of MPXV, which will contribute to future basic and applied research on the virus.