Assessing microbial genome representation across various reference databases: A comprehensive evaluation
DOI:
https://doi.org/10.7124/bc.000AFDKeywords:
metagenomics, bacterial reference databases, taxonomic discrepanciesAbstract
Aim. Metagenomics research can provide significant insights into the composition, diversity and functions of mixed microbial communities found in various environments. To identify bacterial species, reads from samples are mapped to references that are found in bacterial reference databases. Multiple references may be assigned the same taxonomic identifiers yet these references may contain different genomic information. This project was designed to uncover and correct inconsistencies in bacterial reference databases by comparing species names and genomic representation for the three most commonly used bacterial reference databases (PATRIC, RefSeq and Ensembl). Our first study “Improving the usability and comprehensiveness of microbial databases” [1] considered the concordance of the databases based solely on species names. We extended that research to compare not only the species names but also bacterial genomes and to estimate their similarity. Conclusions. The lack of species and genus overlap not only undermines the accuracy of metagenomic analysis but also emphasizes the critical need for a standardized integration of existing databases. Our analysis will not only enhance the identification and characterization of microbial life but also improve the comparability and rigor of metagenomic research.References
Loeffler C et al. Improving the usability and comprehensiveness of microbial databases [published correction appears in BMC Biol. 2020; 18(1):92]. BMC Biol. 2020; 18(1):37.
Downloads
Published
2024-09-10
Issue
Section
Chronicle and Information