Strain Tracking in Complex Microbiomes Using Synteny Analysis Reveals Per-Species Modes of Evolution

Using Genome Synteny Analysis for Strain Tracking in Complex Microbiomes Reveals Species-Specific Evolutionary Patterns

Background

Microbial populations differentiate into distinct strains through single-nucleotide mutations and structural variations such as recombination, insertions, and deletions. Most strain comparison methods primarily quantify differences in single nucleotide polymorphisms (SNPs), overlooking structural variations. However, recombination is a significant driver of phenotypic diversity in many species, including human pathogens. This article introduces a tool called SynTracker, which uses gene synteny (the order of sequence blocks in homologous regions in genomes) to compare microbial strains. Gene synteny is a rich source of genomic information that existing strain comparison tools have not fully utilized. SynTracker is less sensitive to SNPs, does not require databases, and is robust to sequencing errors. It outperforms existing tools in tracking strains from metagenomic data, particularly in contexts such as phages, plasmids, and other data-limited scenarios. When applied to single-species datasets and human gut metagenomes, SynTracker, combined with SNP-based tools, can detect strains enriched by point mutations or structural changes, providing insights into the in-situ evolution of microbes.

Source of the Paper

This paper was authored by Hagay Enav, Inbal Paz, and Ruth E. Ley, affiliated with the Max Planck Institute for Biology (Tübingen, Germany) and the University of Tübingen (Tübingen, Germany). The study was published in the prestigious journal Nature Biotechnology with the article number https://doi.org/10.1038/s41587-024-02276-2.

Detailed Research Process

Process Overview

  1. Identification of Homologous Regions:

    • Select a reference genome and divide it into 1-kbp central regions.
    • Convert the sample-specific metagenomic assembly library into a BLAST database and perform a high-stringency BLAST search using these central regions as queries, with a minimum identity percentage of 97% and a minimum query coverage of 70%.
    • For each BLAST hit, retrieve the target sequence and its 2-kbp flanking regions, collecting these regions into a specific region “bin.”
  2. Calculation of Region-Specific Synteny Scores:

    • Assign each homologous collection of specific regions to a unique region bin.
    • Perform pairwise alignments within each bin to identify synteny blocks and calculate pairwise synteny scores, based on the number of synteny blocks identified and the overlap between the two sequences.
  3. Calculation of Overall Scores (APSS):

    • For each pair of metagenomic samples (or genomes), randomly select n regions for alignment and calculate the APSS by averaging the pairwise synteny scores of these regions.

Main Experiments and Results

  1. Performance Testing and Sensitivity:

    • Conducted in silico simulations using BacMeta software, introducing two types of mutations: only SNPs and only insertions/deletions, comparing the synteny scores of the two simulations.
    • In the SNPs simulation, the minimum mean BLAST identity of regions was 99.5%, whereas in the insertions/deletions simulation, it was as low as 99.79%.
    • Results show that populations with insertions/deletions had significantly lower synteny scores in gene regions compared to SNPs populations.
  2. Strain Classification:

    • Analyzed 140 genomes randomly selected from 14 classified Escherichia coli genomes and constructed a phylogenetic tree.
    • Results indicated that SynTracker could reproduce published phylogenetic groupings using as little as 2% of genome samples.
  3. Threshold Setting for Strain Tracking:

    • In a longitudinal study based on human gut microbiome, computed an APSS value that maximized the accuracy of strain pair classification and used this standard for further analysis of location patterns in different species.
    • Observed that the mother-infant strain sharing ratio was higher in early infancy, and as the infant aged, the total number of shared strains increased, but the proportion relative to the total strains decreased.
  4. High Sensitivity to Genetic Structural Variation:

    • Combined with SNP tracking tools, analyzed samples of Neisseria gonorrhoeae, drug-resistant Escherichia coli, Helicobacter pylori, and Streptomyces rimosus.
    • Results indicated that SynTracker is highly sensitive to structural variations, while SNP tools are more sensitive to point mutations.

Main Conclusions and Research Significance

  • Scientific and Application Value: SynTracker shows unique advantages in in-situ evolutionary analysis of microbes. Combining synteny and SNP analysis reveals different patterns of strain differentiation within species. This holds significant value not only in basic science but also provides new tools and methods for applied research in pathogen tracking and drug resistance studies.

  • Research Highlights:

    • Novelty: Introduces the gene synteny method for the first time, expanding the comparative framework of existing genomic analysis tools.
    • High Efficiency and Accuracy: Capable of highly efficient strain classification and tracking using minimal genome fragments.
    • Broad Applicability: Suitable for data-limited contexts such as plasmids, phages, and rare species strain tracking.

Other Valuable Information

  • Open Source and Method Sharing: SynTracker is available as open-source software on the GitHub platform, facilitating use and improvement by other researchers.
  • Future Prospects: Combining SNP and gene synteny analysis tools can deeply study the molecular mechanisms of species evolution and further reveal the impact of environmental and evolutionary pressures on microbial genome diversity.

This research not only broadens the understanding of the evolutionary mechanisms of microbial populations but also provides strong support for practical applications. Researchers can use these tools to more accurately track and study the complexity and dynamic changes of microbiomes.