Deep Learning-based structural and functional annotation of Pandoravirus hypothetical proteins

Horder, Joseph ORCID: 0000-0002-5714-6655, Connor, Abbie, Duggan, Amy, Hale, Joshua, McDermott, Frederick, Norris, Luke, Whinney, Sophie JD, Mesdaghi, Shahram, Murphy, David, Simpkin, Adam
et al (show 2 more authors) (2023) Deep Learning-based structural and functional annotation of Pandoravirus hypothetical proteins. [Preprint]

Access the full-text of this item by clicking on the Open Access link.


Giant viruses, including Pandoraviruses, contain large amounts of genomic ‘dark matter’ - genes encoding proteins of unknown function. New generation, deep learning-based protein structure modelling offers new opportunities to apply structure-based function inference to these sequences, often labelled as hypothetical proteins. However, the AlphaFold Protein Structure Database, a convenient resource covering the majority of UniProt, currently lacks models for most viral proteins. Here, we apply a panoply of predictive methods to protein structure predictions representative of large clusters of hypothetical proteins shared among four Pandoraviruses. In several cases, strong functional predictions can be made. Thus, we identify a likely nucleotidyltransferase putatively involved in viral tRNA maturation that has a BTB domain presumably involved in protein-protein interactions. We further identify a cluster of membrane channel sequences presenting three paralogous families which may, as seen in other giant viruses, induce host cell membrane depolarization. And we identify homologues of calcium-activated potassium channel beta subunits and pinpoint their likely Acanthamoeba cellular alpha subunit counterparts. Despite these successes, many other clusters remain cryptic, having folds that are either too functionally promiscuous or too novel to provide strong clues as to their role. These results suggest that significant structural and functional novelty remains to be uncovered in the giant virus proteomes.

Item Type: Preprint
Uncontrolled Keywords: 3101 Biochemistry and Cell Biology, 3102 Bioinformatics and Computational Biology, 31 Biological Sciences, Genetics, Machine Learning and Artificial Intelligence, Biotechnology, Infectious Diseases, 2.1 Biological and endogenous factors, 2 Aetiology, Infection
Divisions: Faculty of Health and Life Sciences
Faculty of Health and Life Sciences > Institute of Systems, Molecular and Integrative Biology
Depositing User: Symplectic Admin
Date Deposited: 15 Mar 2024 16:43
Last Modified: 20 Jun 2024 20:35
DOI: 10.1101/2023.12.02.569716
Open Access URL:
Related URLs: