APNet: An Explainable Sparse Deep Learning Model to Discover Differentially Active Drivers of Severe COVID-19

Academic Background

The COVID-19 pandemic has had a significant impact on global public health systems. Although the pandemic has somewhat subsided, its complex immunopathological mechanisms, long-term sequelae (such as “long COVID”), and the potential for similar threats in the future continue to drive in-depth research. Severe COVID-19 cases are often accompanied by serious symptoms such as a “cytokine storm,” acute respiratory distress syndrome (ARDS), and multi-organ failure, necessitating more accurate predictive models and biomarkers to guide clinical decision-making.

Traditional machine learning (ML) and deep learning (DL) models have shown promise in analyzing high-throughput omics data but often lack biological interpretability, making it difficult to reveal nonlinear protein dynamics (e.g., post-translational modifications) and complex signaling pathway regulatory mechanisms. To address this, the authors developed APNet (Activity PASNet), a model that combines differential activity analysis with biologically informed sparse deep learning, aiming to discover drivers of severe COVID-19 through explainable predictions.

Source of the Paper

This paper was co-authored by George I. Gavriilidis, Vasileios Vasileiou, Stella Dimitsaki, and others, affiliated with the Institute of Applied Biosciences at the Centre for Research and Technology Hellas, the Department of Molecular Biology and Genetics at Democritus University of Thrace, and the University Research Institute of Maternal and Child Health and Precision Medicine at the National and Kapodistrian University of Athens, among other institutions. The paper was published on February 8, 2025, in the journal Bioinformatics, titled “APNet, an explainable sparse deep learning model to discover differentially active drivers of severe COVID-19.”

Research Process

1. Overview of the APNet Framework

APNet is a modular computational framework designed for explainable patient classification and biological hypothesis generation through biologically informed deep learning models. Its main tasks include: - Supervised clustering: Distinguishing severe from non-severe COVID-19 cases. - Biological mechanism generation: Revealing potential regulatory networks and signaling pathways by constructing protein-pathway bipartite graphs.

Core components of APNet include: - NetBID2 and scMINER tools: Reverse-engineer protein/gene regulatory networks based on the SJARACNe algorithm, transforming expression matrices into activity matrices. - PASNet model: A biologically informed sparse neural network used for supervised clustering and preliminary biological interpretability analysis. - SHAP values: Enhance model interpretability by identifying the most predictive molecules.

2. Data Processing and Activity Transformation

The study utilized three COVID-19 plasma proteomics datasets (MGH, Mayo, Stanford) and two single-cell RNA sequencing (scRNA-seq) datasets. Expression matrices were transformed into activity matrices using NetBID2 and scMINER tools, capturing protein/gene regulatory relationships. Activity transformation significantly improved the “signal-to-noise ratio” and reduced batch effects.

3. Differential Activity Analysis and Pathway Enrichment

After activity transformation, differentially active proteins/genes (DAPs/DAGs) between severe and non-severe cases were calculated, and pathway enrichment analysis was performed using the Enrichr Knowledge Graph (KG). Results showed that activity analysis identified more COVID-19-related signaling pathways, such as inflammatory responses, apoptosis, and viral infection.

4. Model Training and Validation

The APNet model was trained on the MGH dataset and validated and tested on the Mayo and Stanford datasets. The model performed exceptionally well, with AUC (area under the curve) and F1 scores significantly higher than other baseline models (e.g., random forest and the original PASNet model).

5. Biological Hypothesis Generation

By constructing protein-pathway bipartite graphs, APNet revealed key signaling pathways and regulatory networks associated with severe COVID-19. For example, ACAA1 (acetyl-CoA acyltransferase 1) was identified as an important predictive driver, and its regulatory relationships with proteins like IL-6 and CKAP4 were found to be significant in the immunopathology of COVID-19.

Key Results

  1. Data Distribution Alignment and Batch Effect Reduction: Activity transformation significantly improved data distribution alignment across different datasets and reduced batch effects.
  2. Identification of Differentially Active Drivers: Activity analysis identified 333 common differentially active proteins (DAPs), significantly more than traditional expression analysis.
  3. Superior Model Performance: APNet excelled in predicting severe COVID-19 cases, with AUC and F1 scores significantly higher than other baseline models.
  4. Biological Hypothesis Generation: APNet revealed multiple signaling pathways and regulatory networks associated with severe COVID-19, such as inflammatory responses, apoptosis, and viral infection.

Conclusion and Significance

APNet, as an explainable deep learning framework, not only efficiently predicts severe COVID-19 cases but also reveals potential signaling pathways and regulatory networks through biological hypothesis generation. Its innovation lies in combining activity analysis with biologically informed deep learning models, significantly enhancing the model’s biological interpretability and predictive performance. In the future, APNet could be applied to multi-omics data analysis of other complex diseases (e.g., cancer, neurodegenerative diseases), providing new tools and insights for precision medicine.

Research Highlights

  1. Innovative Methodology: APNet is the first to combine activity analysis with biologically informed deep learning models, addressing the limitations of traditional models in biological interpretability.
  2. Efficient Predictive Performance: APNet performed exceptionally well across multiple COVID-19 datasets, significantly outperforming other baseline models.
  3. Biological Mechanism Revelation: By constructing protein-pathway bipartite graphs, APNet revealed key signaling pathways and regulatory networks associated with severe COVID-19, providing important insights for clinical decision-making.

Other Valuable Information

APNet’s R and Python scripts are open-source and available on GitHub (https://github.com/biodataanalysisgroup/apnet), providing a convenient tool and reference for other researchers. Additionally, the datasets used in the study are publicly available on the Zenodo platform, facilitating replication and further research.

Through APNet, researchers can not only better understand the immunopathological mechanisms of COVID-19 but also provide new strategies and methods for the prevention and control of similar pandemics in the future.