A Two-Phase Epigenome-Wide Four-Way Gene–Smoking Interaction Study of Overall Survival for Early-Stage Non-Small Cell Lung Cancer

Study on the Four-Way Gene-Smoking Interaction and Survival in Early-Stage Non-Small Cell Lung Cancer

Research Background

Lung cancer is one of the most prevalent malignancies worldwide and a leading cause of cancer-related mortality. According to global cancer statistics, approximately 2.5 million new cases are diagnosed annually, with 1.8 million deaths. Among these, non-small cell lung cancer (NSCLC) constitutes the majority, primarily including lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC). Despite advancements in treatment, the survival rates for early-stage NSCLC patients remain low, with a 3-year survival rate of 13-40% and a 5-year survival rate of around 25%. The heterogeneity in clinical outcomes suggests that there are still many undiscovered mechanisms underlying lung cancer progression.

Epigenetic alterations, particularly DNA methylation, are recognized as key factors regulating tumorigenesis and cancer progression. DNA methylation is a reversible epigenetic modification process primarily involving the 5-methylcytosine (5mC) modification of DNA. This modification not only holds potential for early tumor detection but also provides critical insights into tumor metastasis and prognosis, guiding immunotherapy and chemotherapy strategies for NSCLC patients. Additionally, gene-gene (G×G) and gene-environment (G×E) interactions play crucial roles in elucidating the molecular mechanisms of complex diseases. Smoking, as an environmental exposure factor, has been shown to influence DNA methylation patterns, thereby affecting cancer progression.

Previous studies have identified two-way and three-way interactions between smoking and DNA methylation, but these low-order interactions have limitations in explaining the complex patterns of cancer progression. Therefore, investigating higher-order interactions is essential for uncovering the molecular mechanisms of lung cancer progression.

Research Source

This study was conducted by a collaborative team of scientists from multiple international research institutions. The primary authors include Leyi Chen, Xiang Wang, Ning Xie, and others, affiliated with institutions such as the School of Public Health at Nanjing Medical University, Harvard T.H. Chan School of Public Health, Oslo University Hospital, and Lund University. The study was published in 2024 in the journal Molecular Oncology, titled “A two-phase epigenome-wide four-way gene–smoking interaction study of overall survival for early-stage non-small cell lung cancer.”

Research Process and Results

Study Design

The study employed a two-phase design to explore the relationship between four-way gene-smoking interactions and overall survival in early-stage NSCLC patients. The first phase, the discovery phase, utilized DNA methylation data from four international research centers in the USA (Harvard), Spain, Norway, and Sweden. The second phase, the validation phase, used data from The Cancer Genome Atlas (TCGA) database.

Study Population and Data

The study included DNA methylation data from early-stage (Stage I and II) NSCLC patients from five international research centers. The discovery phase comprised 524 patients (425 LUAD and 99 LUSC), while the validation phase included 468 patients (227 LUAD and 241 LUSC). All studies were approved by the respective institutional review boards, and written informed consent was obtained from all participants.

Data Analysis Methods

The study employed statistical analysis based on Cox proportional hazards models, adjusting for factors such as age, sex, smoking status, clinical stage, and study center. A hill-climbing algorithm was used to systematically explore interactions from low-order to high-order. Multiple testing corrections were applied using the false discovery rate (FDR) method to control the overall false-positive rate at 5%.

Key Findings

In the discovery phase, the study identified 39 significant four-way interactions (FDR-q ≤ 0.05), of which only one remained significant in the validation phase (p ≤ 0.05). This significant four-way interaction involved pack-years of smoking, cg05293407TRIM27, cg00060500KIAA0226, and cg16658473SHISA9. Specifically, the interaction between cg16658473SHISA9 and pack-years of smoking, cg05293407TRIM27, and cg00060500KIAA0226 significantly impacted the overall survival of NSCLC patients (discovery set: HRinteraction=0.9993, 95% CI: 0.9990–0.9996, p=3.08×10−6, FDR-q=0.027; validation set: HRinteraction=0.9992, 95% CI: 0.9986–0.9998, p=0.014; combined data: HRinteraction=0.9995, 95% CI: 0.9993–0.9997, p=3.06×10−6).

Interpretation of Results

Using 3D visualization, the study demonstrated the differential effects of cg16658473SHISA9 at varying levels of smoking intensity and cg05293407TRIM27 methylation. In the low-expression subgroup of cg00060500KIAA0226, the risk associated with cg16658473SHISA9 increased with higher smoking intensity and lower cg05293407TRIM27 methylation levels. Conversely, in the high-expression subgroup of cg00060500KIAA0226, the effects were completely reversed.

Prognostic Prediction Model

The study also developed a prognostic prediction model incorporating clinical variables and the four-way interaction. The predictive ability of the model was evaluated using time-dependent ROC curves and AUC values. The inclusion of the four-way interaction significantly improved the model’s AUC (3-year survival AUC=0.709, 5-year survival AUC=0.735), indicating that the four-way interaction enhanced the model’s predictive accuracy.

Research Conclusions and Significance

The study is the first to identify a significant four-way gene-smoking interaction at the epigenome level, involving cg16658473SHISA9, cg05293407TRIM27, cg00060500KIAA0226, and pack-years of smoking. This discovery provides a new epigenetic biomarker for NSCLC prognosis and reveals the complex molecular mechanisms underlying lung cancer progression. Furthermore, the prognostic prediction model developed in the study has potential applications in clinical decision-making.

Research Highlights

  1. Discovery of High-Order Interactions: The study is the first to identify a four-way gene-smoking interaction at the epigenome level, offering new insights into the molecular mechanisms of lung cancer.
  2. Improvement in Prognostic Prediction Models: The inclusion of the four-way interaction significantly enhanced the accuracy of the prognostic prediction model, providing robust support for clinical decision-making.
  3. Integration of Multi-Center Data: The study integrated data from multiple international research centers, enhancing the reliability and generalizability of the results.

Additional Valuable Information

The study also conducted functional annotation and gene enrichment analysis, revealing that genes associated with cg16658473SHISA9 were significantly enriched in immune-related pathways, such as NF-κB, T-cell, and B-cell signaling pathways. This further highlights the critical role of immune regulation in lung cancer progression.


Through this study, scientists have not only uncovered new mechanisms underlying NSCLC patient survival but also provided important theoretical foundations and practical guidance for future lung cancer treatment and prognosis assessment.