Supervised Latent Factor Modeling Isolates Cell-Type-Specific Transcriptomic Modules That Underlie Alzheimer's Disease Progression

Overview

A paper titled “Supervised latent factor modeling isolates cell-type-specific transcriptomic modules that underlie Alzheimer’s disease progression” was published in Communications Biology. The paper was co-authored by research scientists from institutions such as McGill University, Université de Montréal, Yale University, and Rush University Medical Center, including Liam Hodgson, Yue Li, Yasser Iturria-Medina, Jo Anne Stratton, Guy Wolf, Smita Krishnaswamy, David A. Bennett, and Danilo Bzdok. The paper explores cell-type-specific transcriptomic modules related to Alzheimer’s disease (AD) through supervised latent factor modeling.

Research Background

Late-onset Alzheimer’s disease (AD) is a progressively deteriorating neurodegenerative disease, and changes in the brain begin years before symptoms appear. Although neuron loss is a classic feature of AD, genome-wide association studies (GWAS) and recent single-nucleus RNA sequencing (snRNA-seq) studies indicate that glial cells, especially microglia, play a crucial role in the pathophysiology of AD. This research aims to explore AD predicting modules distributed in major brain cell types by integrating the entire transcriptome using pattern learning algorithms.

Research Objective

The objective of the research is to design and implement a supervised latent factor framework to enhance the interpretability of snRNA-seq transcriptomic effects in the granularity of AD-specific gene expression programs. By utilizing this method, the researchers hope to identify disease-driving gene modules in specific cell types and elucidate the biological significance of these modules in predicting AD.

Research Method

The study employed a supervised latent factor modeling method, using Partial Least Squares Discriminant Analysis (PLS-DA) model to analyze the single-nucleus RNA sequencing data in the Rosmap study queue. The specific steps are as follows:

  1. Data Preparation & Preprocessing: Samples were collected from the Rosmap project, involving 48 age and gender-matched donors and approximately 70,000 cells.
  2. Model Training: Independent PLS-DA models were trained for each cell type (including excitatory neurons, inhibitory neurons, oligodendrocytes, oligodendrocyte precursor cells, microglia, and astrocytes) to differentiate AD patients’ cells from non-AD patients’ cells from gene expression data.
  3. Module Identification: The classic gene program database was used to carry out Gene Set Enrichment Analysis (GSEA) for each PLS-DA module to identify specific biological processes and molecular pathways related to AD prediction.
  4. Verification & Evaluation: The model performance and module identification accuracy were evaluated using a 5-fold cross-validation method, and pseudotime ordering was used to infer disease progression.

Research Findings

Through the method mentioned above, the research team obtained the following main results:

  1. Cell-type-specific gene modules: In all six cell types (including excitatory neurons, inhibitory neurons, etc.), cell-type-specific modules composed of a minority of genes were found. These modules can effectively distinguish healthy cells from AD cells. For example, the main predictive module found in microglia is abundant in gene programs related to microglia activation, phagocytosis, and response to amyloid beta plaques. The primary predictive module in astrocytes is related to extracellular matrix organization and cell junction assembly, etc.
  2. Interaction Analysis: Interactions between modules in different cell types were further analyzed, and there was evidence of highly coordinated gene program activity between excitatory and inhibitory neurons and significant interactions with astrocytes. This suggests different functional connections and response mechanisms in different cell types in AD.
  3. Innovative Discoveries: Using pseudotime ordering, the disease progression trajectory of AD patients was inferred. The validity of this trajectory was verified through known clinical and pathological indicators (such as Braak stage and CERAD scores). The results showed that pseudotime highly correlated absolutely with these external disease indicators.
  4. GWAS Risk Gene Localization: The study further explored the 38 known AD risk loci in genome-wide association studies (GWAS). Several risk gene loci were found to have significant influences in certain specific cell type modules. For example, the APOE gene mainly appears in the modules of astrocytes, microglia, and oligodendrocyte precursor cells, while the PICALM gene appears in modules of all cell types.

Conclusion and Significance

The main conclusion of this study is that by using supervised latent factor modeling, it is possible to identify AD predictive gene modules in specific cell types and reveal their crucial role in disease progression. This approach underscores the value of considering all gene expressions simultaneously in single-nucleus RNA sequencing data to isolate multiple potential gene expression modules, providing a fresh perspective for comprehending AD’s pathological mechanism further.

Additionally, the research results underscored the crucial role of microglia in the pathogenesis of AD and suggested potential modes of functional coordination amongst cells, providing a new direction for future research. For example, the activation of TLR2, TLR1, and TLR5 in microglia is through the MAPK/ERK signaling pathway. This discovery could help develop potential treatments for AD.

Finally, by linking known AD risk genes with specific cell types and gene program modules, the research revealed these risk genes’ unique roles in different cell types, offering further support for the value of using single-cell transcriptomic data for AD research.

This study not only expanded the understanding of the pathogenesis of AD through its multi-level analysis framework but also showcased the potential application of machine learning in single-cell genomics. The research results are expected to provide a theoretical basis for the development of new diagnosis and treatment methods.