Leveraging a Phased Pangenome for Haplotype Design of Hybrid Potato

Leveraging a Phased Pangenome for Haplotype Design of Hybrid Potato

Academic Background

Potato (Solanum tuberosum L.) is one of the most important tuber crops globally, providing food for over 1.3 billion people annually across more than 120 countries. However, the tetraploid genome and clonal propagation of potatoes result in slow breeding progress, making it difficult to accumulate beneficial traits rapidly through traditional breeding methods. To accelerate potato improvement, scientists have proposed a seed-propagated hybrid system based on diploid inbred lines. However, the development of diploid inbred lines is hindered by numerous deleterious variants, which severely affect potato growth and overall fitness. Therefore, understanding the nature of these deleterious variants and finding ways to eliminate them has become a key focus of current hybrid potato research.

Additionally, most published diploid potato genomes are unphased, obscuring crucial information on haplotype diversity and heterozygosity. To address this challenge, researchers have developed a phased potato pangenome, aiming to reveal structural variants (SVs) and haplotype diversity in the potato genome, providing theoretical support for future potato breeding.

Source of the Paper

This paper was co-authored by Lin Cheng, Nan Wang, Zhigui Bao, and other scientists from multiple institutions, including the Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, the University of Liège in Belgium, and the Max Planck Institute for Biology in Germany. The paper was published online on December 2, 2024, in the journal Nature, titled Leveraging a Phased Pangenome for Haplotype Design of Hybrid Potato.

Research Process and Results

1. Construction of the Phased Pangenome

The researchers first selected 31 diploid potato varieties, including 10 wild species and 19 cultivated species, and generated genome sequences for 60 haplotypes. Using PacBio HiFi sequencing and Hi-C technology, the researchers performed high-precision genome assembly for these varieties. The average genome size of each haplotype was 811 Mb, with a contig N50 of 12.25 Mb. Using Hi-C data, the researchers assembled the haplotypes into pseudo-chromosomes, achieving an anchoring rate of 95.17%.

To construct the potato pangenome, the researchers used two methods: Pangenome Graph Builder (PGGb) and minigraph-cactus. Ultimately, they built a pangenome graph (PPG-v.1.0) containing 60 haplotypes, comprising 248.64 million nodes and 345.61 million edges, with a total sequence length of 3,076 Mb. Compared to linear reference genomes, the pangenome graph better captures the diversity of the potato genome.

2. Origin and Dynamics of Structural Variants

The researchers further analyzed structural variants (SVs) in the potato genome and found that 90.6% of SVs were associated with transposable elements (TEs). In particular, SVs mediated by long terminal repeat retrotransposons (LTR/Gypsy) were significantly more abundant than those mediated by other types of TEs. The researchers also found that TE-mediated rearrangements (TEMRs) were widespread in the potato genome, especially in cultivated species.

By comparing the genomes of wild and cultivated species, the researchers found that heterozygosity was significantly higher in cultivated species than in wild species (14.0% vs. 9.5%), indicating extensive hybridization during potato domestication. Additionally, the researchers identified numerous haplotype-specific inversions in cultivated species, which may be related to tuber formation and growth in potatoes.

3. Identification and Elimination of Deleterious Structural Variants

The researchers identified 19,625 potential deleterious structural variants (DSVs) and found that these DSVs tended to exist in a heterozygous state in the potato genome. In particular, 97% of DSVs were heterozygous in cultivated species, suggesting that these deleterious variants were “sheltered” during domestication, avoiding negative selection.

To eliminate these deleterious variants, the researchers developed a computational design method based on the pangenome graph, aiming to design ideal potato haplotypes (Ideal Potato Haplotypes, IPHs). By simulating recombination events, the researchers successfully designed two ideal haplotype combinations (IPHs-A and IPHs-E), significantly reducing deleterious variants and providing important references for future potato breeding.

4. The “Broken-Window Effect” and Accumulation of Deleterious Variants

The researchers also found that DSVs tended to form clusters in the potato genome, a phenomenon referred to as the “Broken-Window Effect.” Specifically, deleterious single nucleotide polymorphisms (DSNPs) around DSVs were significantly more abundant in the coupling phase than in the repulsion phase. This effect may be due to DSVs reducing the recombination rate in surrounding regions, leading to the local accumulation of deleterious variants.

Conclusions and Significance

This study, by constructing a phased potato pangenome, revealed structural variants and haplotype diversity in the potato genome, particularly the widespread heterozygosity and deleterious variants in cultivated species. The researchers also developed a computational design method, successfully designing ideal potato haplotypes, providing an important tool for future hybrid potato breeding.

The scientific value of this study lies in its revelation of structural variants and evolutionary dynamics in the potato genome, as well as its provision of new insights for genome research and breeding of other clonally propagated crops. Additionally, the proposed “Broken-Window Effect” offers a new perspective for understanding the accumulation mechanisms of deleterious variants in genomes.

Research Highlights

  1. Construction of the Phased Pangenome: For the first time, researchers constructed a potato pangenome graph containing 60 haplotypes, providing an important resource for understanding potato genome diversity and structural variation.
  2. Origin and Dynamics of Structural Variants: The study revealed the significant role of TEs in potato genome structural variation, particularly the widespread presence of LTR/Gypsy-mediated SVs in cultivated species.
  3. Identification and Elimination of Deleterious Variants: The researchers identified numerous deleterious structural variants and successfully reduced these variants through computational design, offering new tools for potato breeding.
  4. Discovery of the “Broken-Window Effect”: The study proposed the “Broken-Window Effect” of DSVs in the genome for the first time, providing new theoretical support for understanding the accumulation mechanisms of deleterious variants.

This study not only provides important theoretical support for potato breeding but also offers new ideas and methods for genome research and breeding of other crops.