Sequence-Based Functional Metagenomics Reveals Novel Natural Diversity of Cu Resistance Gene copa in Environmental Microbiomes

Sequence-based Functional Metagenomics Reveals New Natural Diversity of Functional COPA Genes in Environmental Microbiomes

The natural diversity of functional genes/proteins in environmental microbiomes is an essential component of evolutionary and bioengineering research. To gain a deeper understanding of the diversity of the copper (Cu) resistance gene COPA in global microbiomes, this study employed a sequence-based functional metagenomics approach. This research not only combined metagenomic assembly techniques, local BLAST, evolutionary trace analysis (ETA), chemical synthesis, and traditional functional genomics but also successfully and efficiently mined the diversity of COPA genes in environmental DNA (eDNA).

Research Background

Microbial evolution has produced diverse functional genes/proteins, which have wide applications in fields such as microbial phylogenetics and protein engineering. For example, genes like DNA-directed RNA polymerase subunit Beta (RPB) and nitrogenase iron protein (NIFH) are widely used to identify and describe uncultivable ‘dark matter’. Known functional proteins represent only a small fraction of the proteins produced through natural selection. High-throughput recovery of natural functional protein variants’ diversity helps reveal the differences between existing natural proteins and random sequences, providing a foundation for protein engineering based on large-scale sequence variant libraries from natural selection.

However, for some functional genes/proteins, such as metal resistance genes, their natural diversity remains difficult to explore due to low abundance in the environment and lack of characterized sequences in common databases. Metagenomic data, containing the complete genetic information of environmental DNA, provides an ideal pathway to explore the natural diversity of these genes/proteins. Traditionally, the detection of functional genes relied on genomic probing of pure cultures. Sequence-based metagenomics, on the other hand, overcomes the limitations of functional screening methods and redundant isolation.

Research Source

This paper was jointly completed by Wenjun Li, Likun Wang, Xiaofang Li, Xin Zheng, Michael F. Cohen, and Yong-Xin Liu, from the Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Hebei Key Laboratory of Soil Ecology, Sonoma State University in the USA, and the State Key Laboratory of Plant Genomics, Chinese Academy of Sciences. The study was published in the journal “Genomics Proteomics & Bioinformatics” in 2023.

Research Process Introduction

1. Data Collection and Processing

87 metagenomic datasets representing various environmental microbiomes globally were collected from public databases. Using the MG-RAST server, these metagenomes were assembled and quality controlled to ensure data integrity and accuracy. The metagenomic data was then input into local BLAST for COPA gene sequence retrieval.

2. COPA Gene Retrieval and Analysis

Through BLAST searches of all assembled metagenomes, 93,899 hits were obtained, with 1,214 high-confidence hits manually screened, ultimately retrieving 517 unique COPA candidate sequences. These sequences underwent further ETA analysis, screening out 175 high-quality new COPA sequences. Then, phylogenetic tree analysis was used to analyze the evolutionary relationships between these sequences and known COPA proteins.

3. Functional Validation Experiments

Ten novel COPA genes were chemically synthesized and heterologously expressed in Cu-sensitive Escherichia coli (ΔcopA). Growth tests and Cu uptake measurements indicated that five of the novel clones positively impacted the host’s Cu resistance and uptake. One recombinant, COPA-like 15 (copal15), successfully restored the host’s Cu resistance and significantly enhanced Cu uptake ability. Two novel COPA genes were also fused with GFP and microscopically observed in E. coli, showing correct expression and localization on the cell membrane.

Research Results

ETA and Structural Features of COPA Proteins

ETA analysis showed that all 34 known COPA proteins mainly originate from 14 bacterial species, with almost all known COPA proteins functioning in Cu efflux, except for E. hirae’s COPA annotated as a copper uptake P-type ATPase. In terms of protein length, COPA typically contains about 800 amino acids, with the longest from Yersinia pestis containing 961 amino acids. Among the 14 COPA groups, the number of heavy metal-transporting ATPase (HMA) domains varies from 1 to 3, but COPA from Legionella pneumophila lacks HMA domains. All COPA proteins have E1-E2 ATPase domains, a structure associated with ATP hydrolysis and Cu binding and efflux through conformational changes.

Novel and Diverse COPA Genes in Global Microbiomes

From 88 metagenomic datasets, over 5.5 million contigs and 134 million amino acid sequences were used for local BLAST, with analysis ultimately conducted in 87 databases. A total of 93,899 hits were obtained, with 517 sequences manually screened, ranging from 500 to 900 amino acids in length. Of these, 315 sequences had transmembrane helices, 222 contained heavy metal-transporting ATPase domains, and 175 COPA-like genes were finally screened. Through Kraken 2 classification, these genes were mainly distributed across five phyla: Proteobacteria, Actinobacteria, Euryarchaeota, Bacteroidetes, and Firmicutes, with 55 sequences from completely unknown species.

Functional Validation

Ten COPA candidate genes were randomly selected from the 175 for chemical synthesis and expression in E. coli ΔcopA. Growth tests and Cu uptake tests showed that five novel COPA genes significantly enhanced the host’s Cu resistance and uptake ability. Notably, COPA-like 6 significantly inhibited host growth but similarly significantly increased Cu uptake capacity.

Research Conclusions

This study significantly expanded the known diversity of COPA proteins and developed a sequence-based high-throughput functional metagenomics method, overcoming biases in length, screening, and resistance gene abundance in traditional methods. The research demonstrated the diversity of COPA genes in new species and their different resistance mechanisms, providing valuable foundational resources for subsequent protein engineering and metal resistance gene evolution studies.

Research Highlights

  1. Efficient Data Processing Method: Developed and applied an efficient sequence-based functional metagenomics method for mining and analyzing functional genes in environmental microbiomes globally.
  2. Diversity Revelation: Revealed the extensive diversity of COPA genes in environmental microbiomes, discovering new COPA sequences from various unknown species.
  3. Functional Validation: Validated the function of newly discovered COPA genes through heterologous expression, demonstrating the potential of new COPA genes in enhancing copper resistance and uptake.

Value and Significance

This research not only has important scientific value for understanding the natural diversity of microbial metal resistance but also provides new gene resources and methods for future environmental remediation and bioengineering applications. Simultaneously, the study’s method offers a highly feasible and efficient technical pathway for functional mining of metagenomic data, with broad application prospects.