Widespread Exclusive Yin Yang Haplotypes in the Human Genome

Unique Yin Yang Haplotypes Widely Present in the Human Genome

Research Background

In genomic studies, yin yang haplotypes refer to pairs of haplotypes that differ at every site. While previous independent reports have indicated the existence of unique yin yang haplotypes, no systematic search had been conducted. Therefore, to better understand the distribution and formation mechanisms of yin yang haplotypes across the entire genome, this study conducted a detailed and systematic search of the entire genome.

Research Source

This article was written by David Curtis and William Amos from the UCL Genetics Institute and the Department of Zoology, University of Cambridge, respectively. It was published online in the European Journal of Human Genetics on June 12, 2023.

Research Process

Sample Acquisition and Preparation: - Used 2504 high-coverage whole genome data downloaded from the 1000 Genomes project. - Data covered 26 populations, divided into 5 super populations: African (afr), American (amr), East Asian (eas), European (eur), and South Asian (sas).

Data Processing: - Extracted all biallelic autosomal and X chromosome SNPs with minor allele frequency (maf) >= 0.1. - Calculated the average genotype distance for each pair of variants to identify groups of SNPs with complete or nearly complete LD (linkage disequilibrium).

Yin Yang Haplotype Identification: - Searched for all variant chains, ensuring that the average genotype distance between each pair of variants did not exceed 0.0015, and that each pair of variants was not separated by more than 9 other variants. - Defined these variant chains as unique yin yang haplotypes, with each chain required to contain at least 20 such variants.

Filtering and Confirmation: - Excluded duplicate sequences and putative yin yang haplotypes covering known alternative sequences or patches. - Obtained the number of individuals with RR, RA, or AA genotypes for each SNP. - Identified and excluded possible duplicate sequences through standards of high read depth and allele balance standard deviation.

Subsequent Analysis: - Calculated recombination rates and background variation for all yin yang haplotypes. - Evaluated the ancestral origin of variants in yin yang haplotypes using whole genome sequence data from chimpanzees, Neanderthals, and modern humans. - Performed gene and gene ontology analysis, looking for associations between yin yang haplotypes and specific phenotypes.

Main Findings

Distribution and Characteristics of Yin Yang Haplotypes

  • The study identified 5114 unique yin yang haplotypes, each containing an average of 34.8 SNPs, spanning an average of 15.7 kb, covering a total of 80 mb, accounting for about 2.6% of the human genome.
  • The recombination rate within yin yang haplotypes averaged 0.061 cm/mb, significantly lower than the genome-wide average recombination rate.
  • The density and heterogeneity of single nucleotide polymorphisms (SNPs) within each super population did not differ greatly between yin yang haplotypes and other genomic regions, although heterogeneity was slightly higher within yin yang haplotypes.

Ancestral Origin and Formation Mechanism

  • The variants in yin yang haplotypes were mostly present only partially in chimpanzee and Neanderthal genomes, indicating that these haplotypes formed gradually rather than appearing through a single mutation event.
  • The study found some haplotypes with completely consistent ancestral alleles in chimpanzees and Neanderthals, suggesting the time point of formation of these haplotypes in modern humans.
  • Some haplotypes showed mixed ancestral origins, providing evidence for the formation of yin yang haplotypes through early recombination events between haplotypes.

Gene and Phenotype Associations

  • About 42.5% of yin yang haplotypes overlapped with at least one gene, but no obvious gene ontology enrichment was observed.
  • The study found associations between the dominant SNPs of yin yang haplotypes and various phenotypes, but conclusions were difficult to draw due to lack of sufficient samples and control for population variability.

Scientific Value and Application Value of the Research

This study demonstrates that unique yin yang haplotypes are not uncommon throughout the genome and cover a considerable portion of the human genome. Although their formation mechanism is still unclear, the existence of yin yang haplotypes may challenge existing models of genetic variation and its propagation in populations. Moreover, since these haplotypes consist of continuous DNA sequences that can be inherited intact, they serve as important markers in studying the distribution of chromosomal regions in genetic history.

Research Highlights

  1. Widespread Presence: The study systematically discovered unique yin yang haplotypes widely present in the human genome, covering 2.6% of the genome.
  2. Low Recombination Rate: The recombination rate within yin yang haplotypes is much lower than the genome-wide average, indicating suppression of recombination events in these regions.
  3. Ancestral Origin: By comparing genomic data from chimpanzees and Neanderthals, the study revealed the gradual formation mechanism of haplotypes.
  4. Phenotype Association: Although some phenotype associations were found, more samples and control for population variability are needed to draw more definitive conclusions.

Potential Challenges and Future Directions of the Research

While the study revealed the existence and preliminary characteristics of yin yang haplotypes, their formation mechanisms and functional significance require further investigation. Future work could focus on: - Identifying specific molecular mechanisms affecting recombination rates in these regions. - Expanding sample size and population diversity to further validate associations between yin yang haplotypes and phenotypes and functions. - Using higher resolution whole genome sequencing technologies to further confirm and explore the precise structure and function of yin yang haplotypes.

This paper provides a new perspective for genomic research and lays the foundation for future studies.