Scientists identify 4,539 previously undescribed protein families in the mouth and gut microbiome

Although advances in the study of the human microbiome have taken place over the past decade mainly due to multi-omics network analyses, undetected unknowns remain. For instance, up to 40% of genes present in the Integrated Gene Catalogue of the human gut microbiome are unmapped to functional databases.

Scientists from Stanford University (USA), One Codex (USA), Joint Genome Institute (USA), Biomedical Sciences Research Center Alexander Fleming (Greece), and Lawrence Berkeley National Laboratory (USA) have now discovered 4,539 conserved small proteins in human microbiomes, with unknown functions.

It is difficult to link genes encoded by human-associated microbes to specific phenotypes. That might be explained by overlooking small open reading frames (sORFs) that encode small proteins—less than 50 amino acids long—due to limitations in methodological approaches.

Using the Human Microbiome Project dataset, for the first time, the authors characterized 4,539 small protein families in 1,773 healthy human-associated metagenomes. Most were identified in mouth and gut only, possibly because datasets on other body sites are underrepresented in the Human Microbiome Project data resource used.

Taxonomic classification, localization across body sites and within cells, and prediction of antimicrobial function showed that small proteins encoded by the human microbiome may perform a wide range of previously unknown functions.

Most families were not represented in traditional reference genomes and did not contain a known protein domain, while only a small group of previously characterized small proteins were abundant in microbiome samples. Ribosomal proteins, quorum-sensing small proteins AgrD and ComC and phenol soluble modulin are examples of known small proteins present in a human microbiome dataset.

A comparison of publicly available metaproteomic datasets also showed evidence of the transcription and translation of a subset of small protein families.

Predicted functions of the small protein families included house-keeping, cell-cell crosstalk, adaptation, and defense against other bacteria or phages. Transmembrane and secreted proteins, which mediate most of the interactions between bacteria and their environment, were more abundant in the gut.

What is even more interesting is the fact that multiple families of small proteins, such as those related to mobile elements, were mapped to more than one domain of life. Although most small protein families were classified as bacteria, 8 families were classified in the eukaryotes domain and 152 families were classified in multiple life domains including bacteria, eukaryote and virus. The authors acknowledged that these findings might suggest ancient conservation of small proteins or genetic transfer between evolutionarily distant organisms.

On the whole, the study of overlooked sORF and the small proteins they encode allow a better understanding of the full coding potential encoded by the human microbiome. As small proteins abundant in the human microbiome have novel functions that have not been previously reported, metagenomic mining efforts are crucial for moving from descriptive to mechanistic science in the microbiome field.


Sberro H, Fremin BJ, Zlitni S, et al. Large-scale analyses of human microbiomes reveal thousands of small, novel genes. Cell. 2019; 178:1245-1259.e14.

GMFH Editing Team
GMFH Editing Team