Data proliferation, reconciliation, and synthesis in viral ecology

Gibb, Rory, Albery, Gregory ORCID: 0000-0001-6260-2662, Becker, Daniel ORCID: 0000-0003-4315-8628, Brierley, Liam, Connor, Ryan, Dallas, Tad, Eskew, Evan ORCID: 0000-0002-1153-5356, Farrell, Maxwell ORCID: 0000-0003-0452-6993, Rasmussen, Angela ORCID: 0000-0001-9462-3169, Ryan, Sadie ORCID: 0000-0002-4308-6321
et al (show 3 more authors) (2021) Data proliferation, reconciliation, and synthesis in viral ecology.

[img] Text
2021.01.14.426572v1.full.pdf - Submitted Version

Download (701kB) | Preview


The fields of viral ecology and evolution have rapidly expanded in the last two decades, driven by technological improvements, and motivated by efforts to discover potentially zoonotic wildlife viruses under the rubric of pandemic prevention. One consequence has been a massive proliferation of host-virus association data, which comprise the backbone of research in viral macroecology and zoonotic risk prediction. These data remain fragmented across numerous data portals and projects, each with their own scope, structure, and reporting standards. Here, we propose that synthesis of host-virus association data is a central challenge to improve our understanding of the global virome and develop foundational theory in viral ecology. To illustrate this, we build an open reconciled mammal-virus database from four key published datasets, applying a standardized taxonomy and metadata. We show that reconciling these datasets provides a substantially richer view of the mammal virome than that offered by any one individual database. We argue for a shift in best practice towards the incremental development and use of synthetic datasets in viral ecology research, both to improve comparability and replicability across studies, and to facilitate future efforts to use machine learning to predict the structure and dynamics of the global virome.

Item Type: Article
Divisions: Faculty of Health and Life Sciences
Faculty of Health and Life Sciences > Institute of Population Health
Depositing User: Symplectic Admin
Date Deposited: 12 Mar 2021 13:49
Last Modified: 08 Sep 2022 07:26
DOI: 10.1101/2021.01.14.426572