Population-wide DNA Methylation Polymorphisms at Single-Nucleotide Resolution in 207 Cotton Accessions Reveal Epigenomic Contributions to Complex Traits

Cotton Population-Level DNA Methylation Polymorphism Study Reveals Epigenomic Contributions to Complex Traits

Background and Research Motivation

In recent decades, genome and genetic diversity have been extensively studied through genome-wide association studies (GWAS), providing a theoretical foundation for understanding crop trait variations. However, the role of epigenetic modifications, such as DNA methylation, in regulating crop traits remains relatively unclear. DNA methylation, an essential epigenetic marker, regulates gene expression, maintains genome stability, and plays a critical role in various agronomic traits by adding a methyl group to cytosines. Studies have found that methylation polymorphisms are associated with ecological adaptability traits, yet the contribution of epigenetic variation to crop traits in natural populations requires further exploration.

To address this need, scientists from Zhejiang University, Alibaba Group, and other research institutions conducted a systematic study, generating high-quality methylome, transcriptome, and genome data for 207 cotton varieties. Extending the classical framework of population genetics to epigenetics, the study systematically analyzed the distribution of DNA methylation polymorphisms across genic regions and transposable elements, revealing the regulatory role of DNA methylation in cotton fiber traits. Published in Cell Research in 2024, this research provides an epigenomic resource to guide future crop improvement.

Research Design and Methods

Sample Collection and Multi-Omics Data Acquisition

The study cultivated 207 core germplasm cotton resources (CUCP1) in Hangzhou, China, collecting fiber samples 20 days post-anthesis (DPA) for whole-genome bisulfite sequencing (WGBS) and transcriptome sequencing (RNA-seq). The sequencing generated 54 billion WGBS reads and 4.42 billion RNA-seq reads, laying a data foundation for analyzing accession-specific gene expression and DNA methylation patterns.

After rigorous data processing and quality control, methylation polymorphism maps were constructed. Combining these data with population-level genetic information allowed comprehensive investigations into the epigenomic regulation of agronomic traits.

Genomic Distribution of DNA Methylation

The cotton genome showed methylation levels of approximately 72% (CG), 55% (CHG), and 11% (CHH), with significant variation across genic regions. Applying the concept of methylation disequilibrium (MD), the study analyzed DNA methylation distributions, finding that CG and CHG sequences are preferentially maintained across cell divisions, while CHH methylation is less stable.

Association Between Methylation and Gene Expression

Using cis-methylation quantitative trait loci (cis-meQTLs) analysis, the study identified numerous methylation polymorphism sites (SMPs) that influence gene expression, analyzing their genomic distribution. Remarkably, 36.39% of cis-eQTM genes were independent of genetic variations, suggesting a novel layer of regulation distinct from SNPs.

Constructing multi-omics regulatory networks, the researchers identified key genes involved in fiber development. For example, the CBL-interacting protein kinase 10 (CIPK10) gene was found to regulate fiber length, as confirmed through CRISPR/Cas9 gene editing experiments. This highlights the potential of DNA methylation data in crop improvement.

Key Findings

SMPs Outnumber SNPs Significantly

The study revealed that SMPs in the cotton genome are 100 times more numerous than SNPs. SMPs were highly enriched in genic regions, particularly introns and promoters, a pattern also observed in Arabidopsis, reinforcing the regulatory potential of methylation polymorphisms.

Methylation Polymorphism and Fiber Traits

Epigenome-wide association studies (EWAS) identified 1,715 epigenetic loci associated with yield and fiber quality traits. Only 2.10% overlapped with GWAS loci, suggesting epigenetic loci independently contribute to phenotypic variations. CG and CHG methylation levels were negatively correlated with gene expression, particularly in promoter regions, while CHH methylation had a weaker regulatory impact.

Fiber Development Gene Network Construction

Integrating GWAS and EWAS data, researchers constructed a gene regulatory network (GRN) for cotton fiber development, involving 397 genes and 634 connections. This network included known fiber elongation-related genes, such as those encoding expansins and cellulose synthases. The epigenetic GRN highlighted the complex interplay of epigenetic and genetic regulation in fiber traits.

CIPK10 Gene Validation Through Gene Editing

Using CRISPR/Cas9, researchers knocked out the CIPK10 gene, which led to significantly shorter fibers, confirming its regulatory role. This finding demonstrates the relevance of DNA methylation data in identifying key functional genes and provides experimental evidence for using epigenomic resources in crop improvement.

Deep Learning Model for Functional Methylation Site Prediction

To predict methylation sites associated with gene expression regulation, the team developed the Deep Functional DNA Methylation Loci (DeepFDML) model. Using convolutional neural networks and Transformer architectures, the model achieved high predictive accuracy (ROC = 0.82, PRC = 0.78). This underscores the potential of deep learning in functional epigenomic studies.

Significance and Future Directions

This study extends our understanding of epigenomic contributions to complex crop traits, offering insights into the independent regulatory roles of DNA methylation. By identifying and validating SMPs associated with fiber traits, the research provides a valuable resource for cotton improvement. The introduction of DeepFDML represents a promising approach for functional methylation site prediction, potentially benefiting related species lacking population-scale methylation data.

Conclusion

This research highlights DNA methylation as an independent regulatory layer for crop improvement. The identification of functional methylation loci through deep learning models can further accelerate crop breeding, providing a foundation for future advancements in agricultural genomics.