Comprehensive Discovery and Functional Characterization of the Noncanonical Proteome

Academic Background

The completion of the Human Genome Project has greatly advanced our understanding of complex biological processes at the genome-wide level. However, only about 1% of the genome encodes proteins, with the majority consisting of non-coding regions that produce abundant non-coding RNAs (ncRNAs), such as long non-coding RNAs (lncRNAs). In recent years, an increasing number of studies have shown that these non-coding RNAs may encode novel peptides and play important roles in cellular activities. For example, certain lncRNA-encoded peptides are crucial in muscle physiological function, metabolic regulation, and immune responses. Nevertheless, due to technological limitations, the systematic identification and functional characterization of these non-canonical translation products (e.g., novel peptides) remain a significant challenge.

Gastric cancer, ranked as the fifth most prevalent cancer globally, is characterized by high heterogeneity and a lack of early diagnostic markers. Although genomic, transcriptomic, and proteomic studies have revealed the multi-omic features of gastric cancer, research on novel peptides remains limited. Therefore, the systematic discovery and functional characterization of these non-canonical peptides not only contribute to a deeper understanding of genome function but may also provide new insights for cancer diagnosis and treatment.

Source of the Paper

This paper was co-authored by Chengyu Shi, Fangzhou Liu, Xinwan Su, and others from multiple institutions, including Zhejiang University, The Fourth Affiliated Hospital of Zhejiang University School of Medicine, and Zhejiang University Cancer Center. The paper was published in Cell Research in 2025, titled “Comprehensive Discovery and Functional Characterization of the Noncanonical Proteome.”

Research Process

1. Identification of Novel Peptides

The research team first constructed a high-coverage peptide sequencing reference library containing 11,668,944 open reading frames (ORFs). To identify novel peptides, they employed ultrafiltration tandem mass spectrometry (MS) technology. Through the analysis of normal gastric tissues, gastric cancer tissues, and cell lines, the team successfully identified 8,945 previously unannotated peptides, nearly half of which were derived from non-coding RNAs.

2. CRISPR Screening and Functional Validation

To further investigate the functions of these peptides, the team conducted CRISPR screening and identified 1,161 peptides associated with tumor cell proliferation. Based on screening scores, amino acid length, and other indicators, the team selected a subset of peptides for functional validation. Using methods such as FLAG-Knockin, they confirmed the existence and physiological functions of these peptides.

3. AI-Based Structure Prediction and Peptide-Protein Interaction Network Analysis

To further reveal the potential regulatory mechanisms of these peptides, the team constructed a framework based on artificial intelligence (AI) structure prediction and peptide-protein interaction networks. Through the analysis of the top 100 candidate peptides, they uncovered the diverse subcellular localization of these cancer-related peptides and their involvement in organelle-specific processes.

4. Functional Validation and Clinical Relevance of Peptides

The team further validated the functions of four representative peptides (PEP1-NC-OLMALINC, PEP5-NC-TRHDE-AS1, PEP-NC-ZNF436-AS1, and PEP2-NC-AC027045.3), finding that they play important roles in mitochondrial complex assembly, energy metabolism, and cholesterol metabolism. Additionally, the dysregulation of these peptides was closely correlated with clinical prognosis.

Key Findings

  1. Identification of Novel Peptides: Using ultrafiltration tandem mass spectrometry, the team successfully identified 8,945 previously unannotated peptides, of which 4,097 were supported by a single peptide-spectrum match (PSM), and 4,866 were supported by at least two distinct PSMs. These peptides were primarily derived from non-coding RNAs, highlighting the potential of non-coding RNAs to encode functional peptides.

  2. CRISPR Screening and Functional Validation: Through CRISPR screening, the team identified 1,161 peptides associated with tumor cell proliferation. Further functional validation demonstrated that these peptides play significant roles in tumor cell proliferation, with their functions dependent on the translation of the peptides themselves rather than the transcription of their host lncRNAs.

  3. AI-Based Structure Prediction and Peptide-Protein Interaction Network Analysis: Based on AlphaFold2 structure prediction and peptide-protein interaction network analysis, the team revealed the diverse subcellular localization of these peptides and their involvement in cellular metabolism and energy metabolism. In particular, these peptides may regulate cellular metabolic processes by interacting with proteins in organelles such as mitochondria and lysosomes.

  4. Functional Validation and Clinical Relevance of Peptides: Through in vitro and in vivo experiments, the team validated the functions of four representative peptides, finding that they play crucial roles in tumor growth and metabolic regulation. Clinical sample analysis showed that the expression levels of these peptides were closely correlated with the prognosis of gastric cancer patients, suggesting their potential as cancer biomarkers.

Conclusions and Significance

This study represents the first systematic discovery and functional characterization of non-canonical peptides in the human genome, revealing the potential of non-coding RNAs to encode functional peptides. By combining high-throughput mass spectrometry, CRISPR screening, and AI-based structure prediction, the team successfully identified a large number of novel peptides and uncovered their important roles in tumor cell proliferation and metabolic regulation. These findings not only provide new perspectives for understanding genome function but also offer potential biomarkers and drug targets for cancer diagnosis and treatment.

Research Highlights

  1. High-Throughput Peptide Identification: Using ultrafiltration tandem mass spectrometry, the team successfully identified 8,945 novel peptides, the largest number of novel peptides ever identified at the proteomic level.

  2. CRISPR Screening and Functional Validation: Through CRISPR screening, the team identified 1,161 peptides associated with tumor cell proliferation and validated their functions through various experiments.

  3. AI-Based Structure Prediction: For the first time, the team integrated AlphaFold2 structure prediction with peptide-protein interaction network analysis, uncovering the diverse functional mechanisms of these peptides.

  4. Clinical Relevance: The team validated the functions of four representative peptides in tumor growth and metabolic regulation and found that their expression levels were closely correlated with the prognosis of gastric cancer patients, suggesting their potential as cancer biomarkers.

Additional Valuable Information

The research team also established a database named “Human Novel Peptides Atlas Database” (http://hmpa.zju.edu.cn/) to integrate and share research data on novel peptides, providing a valuable resource for future studies.