Sequence Analysis: DNA Sequence Alignment Using Transformer Models

Academic Background DNA sequence alignment is a core task in genomics, aiming to map short DNA fragments (reads) to the most probable locations on a reference genome. Traditional methods typically involve two steps: first, indexing the genome, followed by efficient searching to locate potential positions for the reads. However, with the exponential...

SCICONE: Single-Cell Copy Number Calling and Event History Reconstruction

During tumor development, copy number alterations (CNAs) are key drivers of tumor heterogeneity and evolution. Understanding these variations is crucial for developing personalized cancer diagnostics and therapies. Single-cell sequencing technology offers the highest resolution for copy number analysis, down to the individual cell level. However, l...

FlowPacker: Protein Side-Chain Packing with Torsional Flow Matching

The three-dimensional structure of a protein is determined by its amino acid sequence, and the function of the protein is highly dependent on its three-dimensional structure. The side-chain conformations of proteins play a crucial role in protein folding, protein-protein interactions, and de novo protein design. Accurate prediction of protein side-...

CryoTEN: Efficiently Enhancing Cryo-EM Density Maps Using Transformers

Academic Background Cryogenic Electron Microscopy (Cryo-EM) is a crucial experimental technique for determining the structures of macromolecules such as proteins. However, the effectiveness of Cryo-EM is often hindered by noise and missing density values caused by experimental conditions such as low contrast and conformational heterogeneity. Althou...

GCLink: A Graph Contrastive Link Prediction Framework for Gene Regulatory Network Inference

Research Background Gene Regulatory Networks (GRNs) are crucial tools for understanding the complex biological processes within cells. They reveal the interactions between Transcription Factors (TFs) and target genes, thereby controlling gene transcription and regulating cellular behavior. With the advancement of single-cell RNA sequencing (scRNA-s...

ImmunoTar—Integrative Prioritization of Cell Surface Targets for Cancer Immunotherapy

Cancer remains one of the leading causes of death globally. Despite significant advancements in immunotherapy in recent years, such as the successful application of chimeric antigen receptor T-cell (CAR-T) therapy and antibody-drug conjugates (ADCs), the effective identification of cancer-specific surface protein targets remains a major challenge i...

Sul-BERTGRU: An Ensemble Deep Learning Method Integrating Information Entropy-Enhanced BERT and Directional Multi-GRU for S-Sulfhydration Sites Prediction

Background Introduction Post-Translational Modifications (PTMs) are crucial mechanisms for regulating cellular activities, including gene transcription, DNA repair, and protein interactions. Among these, cysteine, a rare amino acid, participates in various PTMs through its thiol group, playing a significant role in redox balance and signal transduc...

Single-Cell Unified Polarization Assessment of Immune Cells

Immune cells undergo cytokine-driven polarization in response to various stimuli, which alters their transcriptional profiles and functional states. This dynamic process plays a central role in immune responses in both health and disease. However, there has been a lack of systematic methods to assess cytokine-driven polarization in single-cell RNA ...

Contrastive Mapping Learning for Spatial Reconstruction of Single-Cell RNA Sequencing Data

Single-cell RNA sequencing (scRNA-seq) technology enables high-throughput transcriptomic profiling at single-cell resolution, significantly advancing research in cell biology. However, a notable limitation of scRNA-seq is that it requires tissue dissociation, resulting in the loss of the original spatial location information of cells within tissues...

Efficient Storage and Regression Computation for Population-Scale Genome Sequencing Studies

With the increasing availability of large-scale population biobanks, the potential of Whole Genome Sequencing (WGS) data in human health and disease research has been significantly enhanced. However, the massive computational and storage demands of WGS data pose significant challenges to research institutions, especially those with limited funding ...