Limitations in Next-Generation Sequencing-Based Genotyping of Breast Cancer Polygenic Risk Score Loci
Limitations of Next-Generation Sequencing-Based Genotyping for Breast Cancer Polygenic Risk Score Loci
Background Introduction
In the prediction of hereditary Breast Cancer (BC), Polygenic Risk Scores (PRSs) are increasingly being used as an important tool for individual risk prediction. The calculation of PRS relies on accurately reproducing variant Allele Frequencies (AFs) to predict PRS values accurately. However, there are currently many technical limitations when using Next-Generation Sequencing (NGS) technology for polygenic risk score genotyping analysis. The background of this study lies in these technical challenges, which are of great significance for improving and optimizing breast cancer risk assessment models.
Research Source and Author Background
This study was conducted by the bioinformatics working group from the German Consortium for Hereditary Breast and Ovarian Cancer (GC-HBOC). The author team includes several scientists from various German universities and research institutions, specifically from Carl Gustav Carus University Hospital, Dresden University of Technology, University Hospital Münster, University Hospital Regensburg, University Hospital Cologne, Hannover Medical School, and University Hospital Tübingen. The study was published in the European Journal of Human Genetics in 2024.
Detailed Research Process
Research Flow
The research is divided into three main stages: 1. Analysis of Subject Gene Variants: First, the study analyzed the AFs of PRS variants in the European samples of the gnomAD V3.1.2 database and verified whether these variants could be converted to the hg38 reference genome. Some variant positions were found to be unmatched or missing. 2. Verification of Real-World Datasets: Five GC-HBOC centers provided AFs of PRS variants from real-world datasets, which were compared with the expected AFs from canrisk. Up to 24 variants showed significant deviations. 3. Feasible Solutions in Clinical Diagnosis: The study proposed possible solutions to improve genotyping performance in clinical diagnosis, such as using proxy alleles and alternative variant sites.
Experimental Details
- Sample Source: Five participating units from GC-HBOC centers, including the Institute of Medical Genetics and Applied Genomics (University Hospital Tübingen), Institute for Clinical Genetics (Carl Gustav Carus University Hospital), Department of Medical Genetics (University Hospital Münster), Center for Familial Breast and Ovarian Cancer (University Hospital Cologne), and Institute of Human Genetics (University Hospital Regensburg), provided between 339 and 1410 samples.
- Analysis Tools: Multiple variant detection tools were used, including Dragen, Freebayes, and GATK, with different calling modes (forced/unforced). Genotype data were mainly generated through WGS and specific customized cancer panels.
Data Analysis Methods
- Variant Annotation: The study used variant identifiers corresponding to dbsnp and variant annotations from gnomAD.
- Assessment and Conversion of Expected AF: The study compared the expected AFs of PRS variants in the canrisk knowledge base with the AFs of non-Finnish European (NFE) samples in gnomAD V3.1.2. Detailed records were made for AFs that could not be matched or showed significant deviations.
- Determination of AFs Exceeding Threshold: By sorting absolute differences in descending order and applying the “elbow method”, thresholds close to the point distance were determined to screen for significantly deviated AFs.
Main Results
Data Generation and Analysis
- AF Deviations: The study found that out of 332 PRS variants studied, 24 showed significant deviations in AFs in gnomAD v3.1.2 samples compared to the expected AFs from canrisk. These deviations were mostly related to technical artifacts, such as variant sites in low-complexity regions or failing to meet Variant Quality Score Recalibration (VQSR) filtering standards.
- Variant Detection in Real-World Data: In the study, data provided by each participating unit showed that at least 11 to 23 sites displayed significant AF deviations, which depended not only on sequencing technology but also on the variant detection tools and calling modes used.
Impact on Breast Cancer Risk Prediction
Through simulating 10-year and lifetime risk assessments under different scenarios (including age, risk factors, etc.), the study showed that variant sites with significantly deviated AFs would lead to small deviations of 1% to 2% in breast cancer risk prediction. While these deviations are not critical in practical applications, they are important references when discussing the accuracy of new PRS designs.
Improvement Strategies
By applying alternative variant sites and considering proxy alleles, the study showed that the AF detection accuracy of some variant sites could be improved. For example, for frequently erroneous sites like rs73754909 and rs79461387, alternative alleles showed higher consistency with expected AFs.
Summary and Outlook
This study systematically evaluated the technical limitations of NGS in PRS genotyping analysis, demonstrating the shortcomings of existing methods and improvement strategies. Especially in new breast cancer risk predictions, accurately reproducing variant allele frequencies is crucial for optimizing PRS design. This research not only has significant value in breast cancer diagnosis but also provides technical guidance for risk scoring of other gene-related diseases.
Research Significance
As NGS technology is widely used in clinical genetic testing, improving detection accuracy and reproducibility becomes key. This study reveals some technical challenges and provides directions for improvement, which is not only helpful for individual risk prediction but also has important guiding significance for the design and application of new PRSs in the future.