Trajectory Alignment of Gene Expression Dynamics

The advent of single-cell RNA sequencing (scRNA-seq) technology has provided unprecedented resolution for studying gene expression dynamics during cell development and differentiation. However, due to the complexity of biological processes, cell developmental trajectories under different conditions are often asymmetric, posing challenges for data integration and comparison. Existing methods typically rely on integrating samples from different conditions before performing cluster analysis or inferring shared trajectories. However, these methods often perform poorly when dealing with asymmetric trajectories, potentially obscuring key differentially expressed genes (DEGs).

To address this issue, researchers have developed a new method—Trajectory Alignment of Gene Expression Dynamics (Tragedy). The Tragedy method allows for the direct alignment of independent cell developmental trajectories without the need for dataset integration, thereby avoiding errors that may arise during the integration process. This method provides a more precise tool for studying cell developmental processes under different conditions.

Source of the Paper

This paper was co-authored by Ross F. Laidlaw, Emma M. Briggs, Keith R. Matthews, Amir Madany Mamlouk, Richard McCulloch, and Thomas D. Otto. The authors are affiliated with institutions such as the University of Glasgow, University of Edinburgh, Newcastle University, University of Lübeck, and Université de Montpellier. The paper was published on March 11, 2025, in the journal Bioinformatics, titled “Trajectory Alignment of Gene Expression Dynamics (Tragedy).”

Research Process and Results

1. Data Preparation and Interpolated Point Generation

The input for the Tragedy method consists of two scRNA-seq datasets for which pseudotime values have been calculated. To reduce computational complexity and noise, the researchers performed interpolation on the datasets, generating a user-defined number of interpolated points. These points represent the gene expression patterns of surrounding cells within specific time windows. The pseudotime window size for interpolated points is adjusted based on cell density, with smaller windows in high-density regions and larger windows in low-density regions.

2. Calculating Transcriptomic Dissimilarity

Next, Tragedy calculates the transcriptomic dissimilarity between all interpolated points in the two trajectories and stores these differences in a matrix. The dissimilarity can be calculated using Euclidean distance, Pearson correlation, or Spearman correlation. By default, Tragedy uses Spearman correlation, adjusted so that a score of 0 indicates perfect positive correlation.

3. Identifying the Optimal Alignment Path

Tragedy determines the optimal alignment path between the two trajectories using the Dynamic Time Warping (DTW) algorithm. The researchers first identify the start and end points of the alignment path and optimize these points through bootstrapping. Ultimately, Tragedy selects the alignment path with the lowest average path score.

4. Aligning Pseudotime

Once the alignment path is determined, Tragedy adjusts the pseudotime of the interpolated points so that matched points have similar pseudotime values. For multi-matches, Tragedy handles the scaling of pseudotime values. Finally, Tragedy maps the pseudotime of interpolated points to individual cells, completing the alignment process.

5. Differential Expression Analysis

Tragedy uses a sliding window soft clustering approach to compare differentially expressed genes between the two conditions. The user defines the number of windows and the degree of overlap, and Tragedy assigns cells to different windows based on these parameters. Statistical comparisons are performed within each window, and differentially expressed genes are identified using the Mann-Whitney U test and log2FC calculation.

Key Results

1. Alignment of Simulated Datasets

The researchers generated three sets of simulated datasets using dyngen, including two positive controls and one negative control. Tragedy accurately captured the alignment of trajectories in all simulated datasets, while existing methods such as CellAlign and genes2genes (g2g) performed poorly when handling asymmetric trajectories. Particularly in the negative control dataset, Tragedy correctly identified that there was no shared biological process between the two datasets.

2. Application to Real Datasets

The researchers applied Tragedy to real datasets from Trypanosoma brucei and T-cell development. In the Trypanosoma brucei dataset, Tragedy accurately captured the alignment between wild-type (WT) and zc3h20 knockout (KO) cells and identified more differentially expressed genes. Compared to Seurat and tradeSeq, Tragedy performed better in identifying biologically relevant genes and processes.

In the T-cell development dataset, Tragedy successfully compared the developmental trajectories of wild-type and bcl11b knockout cells and identified more differentially expressed genes. Tragedy’s runtime was also significantly shorter than that of tradeSeq, while providing richer biological insights.

Conclusion and Significance

The introduction of the Tragedy method provides a new tool for trajectory alignment and differential expression analysis in single-cell transcriptomics. Compared to existing methods, Tragedy allows for the direct alignment of independent cell developmental trajectories without dataset integration, thereby avoiding errors that may arise during the integration process. Through this method, researchers can more accurately identify differentially expressed genes and biological processes under different conditions, providing a more precise tool for understanding cell development and differentiation.

Research Highlights

  1. Innovative Alignment Method: Tragedy achieves precise alignment of independent trajectories through interpolated points and the DTW algorithm, avoiding errors in dataset integration.
  2. Efficient Differential Expression Analysis: Tragedy uses a sliding window soft clustering approach to identify more differentially expressed genes under different conditions, providing richer biological insights.
  3. Broad Application Scenarios: Tragedy is not only applicable to simulated datasets but can also handle complex real datasets, such as Trypanosoma brucei and T-cell development.

Future Prospects

With the continuous development of single-cell sequencing technology, the Tragedy method is expected to find applications in more biological studies. Particularly in combination with perturb-seq and lineage tracing techniques, Tragedy will be able to more accurately analyze gene expression dynamics during cell development, offering new perspectives on the mechanisms of cell fate determination.