Genome-Wide Repeat Landscapes in Cancer and Cell-Free DNA

A Panoramic View of Whole Genome Repetitive Sequences in Cancer and Circulating Free DNA

Research Overview

Research Background and Significance

Throughout the development of cancer and other diseases, genetic changes in repetitive sequences within the genome are a significant characteristic. However, standard sequencing methods struggle to effectively characterize these repetitive sequences. To tackle this challenge, this study developed a novel method called Artemis (Analysis of Repeat Elements in Disease) for recognizing repetitive elements in whole genome sequencing. By analyzing tissue and plasma samples from various cancer patients, this study aims to explore specific changes in these repetitive elements and assess their potential application in early cancer detection and disease monitoring.

Source of the Paper

This study was jointly conducted by Akshaya V. Annapragada, Noushin Niknafs, James R. White, and others, affiliated with the Sidney Kimmel Comprehensive Cancer Center and the Department of Medicine at Johns Hopkins University School of Medicine. The results were published in the March 13, 2024 issue of “Science Translational Medicine,” article number eadj9283.

Research Process

Overview of the Process

The overall process of this study includes the development of the Artemis method, data collection and processing, identification of repetitive elements, machine learning modeling, result validation, and clinical application evaluation. The subjects of the study encompass 2837 tissue and plasma samples from 1975 patients, including various types of cancer such as lung cancer, breast cancer, and colorectal cancer.

Data Collection and Processing

The study initially used a new kmer search method based on the complete T2T reference genome (chm13) to de novo identify kmers (short sequences), discovering a total of 1.2 billion 24-bp kmers. These kmers were used to define 1280 types of repetitive elements, including long interspersed nuclear elements (LINEs), short interspersed nuclear elements (SINEs), long terminal repeats (LTRs), transposable elements, and human satellite families.

Identification and Analysis of Repetitive Elements

Using the Artemis method, the study analyzed specific changes in these repetitive elements in different types of cancer, finding that 820 of these elements showed changes in cancer for the first time. Additionally, repetitive elements were enriched in driving gene regions, and these changes were associated with genomic structural changes and epigenetic status.

Establishment of the Machine Learning Model

Through machine learning analysis of the whole genome repetitive landscape and the fragmentation patterns of cell-free DNA (cfDNA), the study developed a predictive model capable of detecting early-stage lung cancer and liver cancer. The model demonstrated high accuracy in cross-validation and external validation cohorts, enabling non-invasive identification of the tissue of origin of tumors.

Validation and Clinical Application Evaluation

The study results indicate that changes in the repetitive landscape are widespread in cancer genomes, and these changes can be detected via cfDNA, providing potential for early cancer detection and disease monitoring. Specifically, the Artemis score of the machine learning model can distinguish between cancerous and normal tissues and correlate closely with overall survival and progression-free survival in patients.

Research Results

Initial Discovery of Genome-wide Repetitive Elements

Through de novo identification, the study found 1.2 billion specific kmers representing 1280 types of repetitive elements present on all chromosomes across the whole genome. Further analysis of these repetitive landscapes revealed that 820 types of repetitive elements exhibited changes for the first time in cancer.

Genomic Distribution and Cancer Correlation of Repetitive Elements

The enrichment of repetitive elements in cancer genes suggests that these elements may play important roles in cancer development, such as specific structural change functions in gene amplification, deletion, and rearrangement. Furthermore, the enrichment of repetitive elements at structural breakpoints in specific cancer types further confirms their potential role in promoting these structural changes.

Detection of Repetitive Sequence Landscape in cfDNA

The study demonstrated that even in low-coverage whole genome sequencing, changes in the repetitive sequence landscape can be reliably detected in cfDNA. Analysis revealed that in different cancer types, many repetitive elements in cfDNA show specific changes consistent with tumor tissue. Additionally, changes in the epigenetic state (such as histone marks) also affected the representation of these elements in cfDNA.

Performance Evaluation of the Machine Learning Model

With the Artemis score of the machine learning model, it is possible to effectively distinguish between cancerous and normal states when analyzing the cfDNA of cancer patients. Furthermore, the model’s score significantly correlates with overall survival and progression-free survival in patients, particularly in late-stage cancer patients, where a high Artemis score is associated with a poor prognosis.

Clinical Application Potential

The study suggests that by integrating the Artemis score with other cfDNA fragmentation features, a combined model can achieve early detection, monitoring, and inference of the tissue of origin for cancer patients. Particularly for detecting lung cancer and liver cancer, this combined model shows high accuracy and reliability, promising new tools for early cancer screening and personalized treatment in future clinical applications.

Conclusion

By developing the Artemis method, this study provides a new whole genome analysis method based on repetitive elements, capable of detecting and characterizing widespread changes in cancer. These results not only reveal extensive changes in repetitive sequences within cancer genomes but also offer new methods for early cancer detection and disease monitoring. With further optimization and validation, this method shows potential for significant roles in early cancer diagnosis and treatment.

Significance of the Study

This study offers important insights into changes in repetitive sequences within cancer genomes, revealing their potential roles in cancer development. Simultaneously, the Artemis method proposed by the study provides new strategies for non-invasive cancer detection using cfDNA, facilitating early detection and precision treatment. Additionally, through the analysis of various cancer types, the study provides important data for further exploration of common characteristics and differences among different tumors at the genomic level.