Multiscale Footprints Reveal the Organization of Cis-Regulatory Elements
Multiscale Footprints Reveal the Role of Cis-Regulatory Elements in Cell Differentiation and Aging
Background Introduction
The regulation of gene expression is a key mechanism in cell fate determination and disease development, and cis-regulatory elements (CREs) play a crucial role in this process. CREs dynamically regulate gene expression by binding to various effector proteins, such as transcription factors and nucleosomes. However, existing research methods have limitations in measuring the binding dynamics of these effector proteins across the genome, especially at the single-cell level. This makes it difficult to fully understand how the structure of CREs is linked to their function, particularly during cell differentiation and aging.
To address this issue, a research team from the Broad Institute of MIT and Harvard, Harvard University, and other institutions developed a computational method called PRINT, which can identify multiscale footprints of DNA-protein interactions from chromatin accessibility data. Based on this, they further developed the Seq2Print framework, leveraging deep learning to precisely infer transcription factor and nucleosome binding and decode the regulatory logic of CREs. This study was published in Nature in 2024 under the title “Multiscale footprints reveal the organization of cis-regulatory elements.”
Research Team and Publication Information
The study was conducted by a team of researchers including Yan Hu, Max A. Horlbeck, Ruochi Zhang, and others, primarily from the Broad Institute of MIT and Harvard and Harvard University. By combining computational and experimental biology approaches, the team successfully revealed the dynamic changes of CREs during cell differentiation and aging. The paper was accepted on November 22, 2024, and published online in the same year.
Research Process and Results
1. Development of the Multiscale Footprint Detection Method (PRINT)
The research team first developed the PRINT method to detect multiscale footprints of DNA-protein interactions from chromatin accessibility data. The core innovation of PRINT lies in overcoming the sequence bias of the Tn5 transposase, which significantly interferes with the accuracy of footprint detection. To achieve this, the team trained a convolutional neural network (CNN) to predict the insertion preferences of Tn5 on deproteinized DNA. This model performed exceptionally well on bacterial artificial chromosome (BAC) data, significantly outperforming traditional k-mer and position weight matrix (PWM) models.
The PRINT method quantifies the significant depletion of Tn5 insertions through statistical approaches, generating footprint scores. The team validated the effectiveness of PRINT in vitro, demonstrating that PRINT could accurately detect binding sites of transcription factors such as Myc/Max and CEBPA, whereas traditional ATAC-seq footprinting methods failed to distinguish binding sites from background signals.
2. Development of the Deep Learning Framework Seq2Print
Based on the multiscale footprints generated by PRINT, the research team further developed the Seq2Print framework. Seq2Print uses deep learning models to predict multiscale footprints from DNA sequences and infer transcription factor and nucleosome binding. The model can predict nucleosome and transcription factor footprints using only local DNA sequences as input, achieving high prediction accuracy (overall correlation of 0.75) in ATAC-seq data from HepG2 cells.
A key feature of Seq2Print is its ability to parse sequence features within CREs and identify critical transcription factor binding sites. For example, in a specific CRE region, Seq2Print successfully identified binding sites for transcription factors such as NFE2L2 and NFYB, revealing their potential roles in regulating nucleosome positioning.
3. Application to Single-Cell Data and Analysis of Cell Differentiation Trajectories
The research team applied Seq2Print to single-cell ATAC-seq data from human bone marrow, analyzing the dynamic changes of CREs during hematopoietic differentiation. They found that CREs undergo sequential establishment and expansion during differentiation, particularly during erythroid differentiation, where transcription factors such as GATA and TAL bind early, while KLF1 and NFE2 bind later. This sequential binding pattern is closely related to the gradual expansion of CREs, shedding light on the dynamic process of enhancer establishment.
4. Changes in CREs During Aging
The team also used Seq2Print to analyze changes in CREs in mouse hematopoietic stem cells (HSCs) during aging. They discovered that aging is accompanied by widespread reduction in nucleosome footprints and a significant increase in ETS composite motifs. These changes may be related to the dysregulation of gene expression during aging, particularly in genes associated with the decline of HSC function.
Conclusions and Significance
This study successfully revealed the dynamic changes of CREs during cell differentiation and aging by developing the PRINT and Seq2Print methods. The PRINT method can detect multiscale footprints from chromatin accessibility data, while Seq2Print deciphers the sequence features and regulatory logic of CREs through deep learning models. These methods not only improve the accuracy of transcription factor binding predictions but also provide new perspectives for understanding the role of CREs in gene regulation.
Research Highlights
- Multiscale Footprint Detection: The PRINT method can detect DNA-protein interactions of varying sizes, significantly improving the sensitivity and specificity of footprint detection.
- Deep Learning Framework: Seq2Print uses deep learning models to parse sequence features of CREs, enabling the prediction of transcription factor and nucleosome binding and the identification of novel regulatory motifs.
- Single-Cell Resolution: The team applied Seq2Print to single-cell ATAC-seq data, revealing the dynamic changes of CREs during cell differentiation and aging.
- Aging-Related Discoveries: The study found that aging is accompanied by widespread reduction in nucleosome footprints and an increase in ETS composite motifs, providing new insights into gene expression dysregulation during aging.
Additional Valuable Information
The research team also provided pre-trained models for PRINT and Seq2Print, as well as genome-wide Tn5 bias reference tracks, for use by other researchers. These resources will facilitate further analysis and application of chromatin accessibility data.
By combining computational and experimental biology approaches, this study successfully revealed the complex dynamics of CREs in gene regulation, providing new tools and perspectives for understanding cell fate determination and disease development.