Contrastive Mapping Learning for Spatial Reconstruction of Single-Cell RNA Sequencing Data
Single-cell RNA sequencing (scRNA-seq) technology enables high-throughput transcriptomic profiling at single-cell resolution, significantly advancing research in cell biology. However, a notable limitation of scRNA-seq is that it requires tissue dissociation, resulting in the loss of the original spatial location information of cells within tissues. Spatial transcriptomics (ST) technology can provide precise spatial gene expression maps, but it faces constraints in terms of the number of genes detected, cost, and the granularity of cell type annotation. Therefore, recovering spatial information in scRNA-seq data has become a significant challenge in current research.
To address this issue, researchers have proposed methods to transfer knowledge between scRNA-seq and ST data through cell correspondence learning, thereby recovering spatial information in scRNA-seq data. However, existing methods have limitations in modeling local and global relationships and integrating cell type information, leading to limited accuracy in spatial mapping.
Source of the Paper
This paper was jointly completed by a research team from City University of Hong Kong, Shantou University, Shantou University Medical College, and South China University of Technology. The main authors include Xindian Wei, Tianyi Chen, Xibiao Wang, and others, with the corresponding authors being Cheng Liu from Shantou University and Hau-San Wong from City University of Hong Kong. The paper was published on February 24, 2025, in the journal Bioinformatics, titled “COME: Contrastive Mapping Learning for Spatial Reconstruction of Single-Cell RNA Sequencing Data.”
Research Process and Results
Research Process
The core of the COME method is to establish a mapping relationship between scRNA-seq and ST data through a contrastive learning framework, thereby recovering spatial information in scRNA-seq data. The research process mainly includes the following steps:
Data Preprocessing
The study used datasets from three different biological systems (Drosophila embryo, mouse primary visual cortex, and human pancreatic cancer) for scRNA-seq and ST. First, the researchers standardized the data to ensure consistent total gene expression levels per cell. Then, by selecting shared genes between scRNA-seq and ST data, the two modalities were aligned.Cell Correspondence Learning
The study employed a shared autoencoder to extract latent representations of scRNA-seq and ST data. By decoding the latent codes of scRNA-seq data, reconstructed spatial data were generated. Additionally, a coefficient layer was introduced to learn the mapping from scRNA-seq to the spatial domain. The coefficient matrix was used to capture the association strength between cells and spatial spots.Contrastive Learning Module
To enhance the discriminative ability of latent feature representations, the study designed a contrastive learning module. This module includes cell-type contrastive learning and inter-contrastive learning. Cell-type contrastive learning leverages cell type information in scRNA-seq data to bring cells of the same type closer in the latent space. Inter-contrastive learning ensures more consistent latent feature representations between scRNA-seq and ST data through the mapping matrix.Optimization and Evaluation
The researchers optimized the network model by combining reconstruction loss, coefficient regularization loss, and structural similarity regularization loss. Finally, the effectiveness of the COME method was validated by predicting the spatial locations of scRNA-seq cells. Evaluation metrics included Pearson correlation coefficient (PCC), structural similarity index (SSIM), and root mean square error (RMSE).
Main Results
Spatial Gene Reconstruction
Experiments on Drosophila embryo data showed that the COME method significantly outperformed other methods in reconstructing spatial gene expression. The median PCC of COME was significantly higher than other methods, especially in reconstructing genes with clear spatial features (e.g., twi, ftz, and cg11208), where COME’s performance was particularly outstanding.Analysis of Cellular Resolution Spatial Transcriptomic Data
In experiments on mouse primary visual cortex data, the COME method excelled in predicting gene spatial patterns. Particularly on the STARmap dataset, the median PCC of COME reached 0.233, a 12% improvement over the second-best method. Additionally, COME accurately inferred the layered distribution of glutamatergic neurons in tissues, consistent with previous research findings.Spatial Deconvolution
In experiments on human pancreatic cancer data, the COME method successfully distinguished cell type distributions in cancerous and non-cancerous regions. COME accurately predicted the positions of major cell types in the tumor microenvironment (TME) and showed high consistency with the expression patterns of marker genes. In contrast, other methods (e.g., Tangram and GraphST) performed poorly in distinguishing cancerous and non-cancerous regions.
Conclusions and Significance
The COME method effectively recovers spatial information in scRNA-seq data through a contrastive learning framework and has validated its accuracy and generalizability across multiple biological systems. This method not only reconstructs spatial gene expression patterns but also infers the distribution of cell types within tissues, providing an essential tool for understanding cellular interactions and functions.
Research Highlights
Contrastive Learning Framework
The COME method introduces contrastive learning into the mapping learning of scRNA-seq and ST data for the first time, significantly improving the accuracy of spatial reconstruction.Integration of Cell Type Information
Through cell-type contrastive learning, the COME method better captures spatial dependencies between similar cell types, enhancing the biological significance of the model.Broad Application Value
The successful application of the COME method in multiple biological systems demonstrates its wide potential in spatial transcriptomics research, particularly in tumor microenvironment and neuroscience studies.
Additional Valuable Information
The code for the COME method has been open-sourced on GitHub (https://github.com/cindyway/come), allowing researchers to freely download and use it. Additionally, the research team has provided detailed data preprocessing and evaluation workflows to facilitate replication and extension of the study by other researchers.
This paper provides new ideas and methods for the spatial reconstruction of single-cell transcriptomic data, offering significant scientific value and application prospects. Through the COME method, researchers can gain deeper insights into the spatial distribution and functions of cells within tissues, providing new tools for disease research and treatment.