Researchers have developed VIRE, a database that integrates approximately 1.7 million viral genomes derived from more than 100,000 metagenomes worldwide. Metagenomic data is obtained by comprehensively sequencing all DNA present in an environment. This approach enables the recovery of genomic information from microorganisms and viruses that cannot be cultured in the laboratory.

pexels-cdc-library-3992943 (1)

The research was led by Peer Bork, Senior Scientist and Interim Director General at EMBL Heidelberg, and Suguru Nishijima, Project Associate Professor at the Life Science Data Research Center, Graduate School of Frontier Sciences, The University of Tokyo, and former Postdoctoral Fellow in the Bork Group. 

READ MORE: We may be overestimating the association between gut bacteria and disease, study finds

READ MORE: Microbial load can influence disease associations, new model reveals

Viral Integrated Resource across Ecosystems (VIRE) is the largest and most comprehensive viral resource to date, providing a global foundation for understanding viral diversity across human-associated and environmental ecosystems. This work is expected to greatly advance understanding of the ecological roles of viruses and their interactions with microbial communities.

Although diverse viruses are known to inhabit ecosystems across the planet, the lack of a comprehensive framework has hindered systematic understanding of their global diversity. In particular, many viruses found in environments such as oceans, soils, and the human gut are bacteriophages, which infect bacteria. Because the majority of bacteriophages cannot be easily cultured in the laboratory, their diversity and functions have long remained elusive. 

Viral detection

Using state-of-the-art viral detection technologies, the team comprehensively identified viruses, primarily bacteriophages, across diverse environments such as the human body, oceans, and soils, and predicted their taxonomy, hosts, and gene functions. They also applied advanced computational approaches to detect viral genomes with high accuracy. This enabled them to collect and integrate approximately 1.7 million medium- to high-quality viral genomes, representing a vast expansion beyond existing viral databases.   

Furthermore, for viruses infecting bacteria and archaea, the team utilised the host defense mechanism known as CRISPR spacer sequences to infer host organisms with high precision. These are DNA sequences retained by bacteria and archaea as a record of past viral infections, and by analysing these sequences, it is possible to infer which viruses have previously infected which host organisms. The researchers also clarified the functions of viral genes by integrating annotations from multiple biological databases, such as KEGG and COG, which describe molecular pathways and gene functions.

World’s largest

VIRE is now the world’s largest integrated platform providing viral taxonomy, predicted hosts, and gene functions in a unified framework. It is expected to enable data-driven research across a wide range of fields, including viral ecology, microbial evolution, and environmental sciences.

This achievement represents a major step forward in understanding the global diversity of viruses and will contribute to uncovering virus–microbe interactions as well as advancing studies on environmental change, human health, and disease.

Explore VIRE