findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM



Chojnowski, Grzegorz, Simpkin, Adam J, Leonardo, Diego A, Seifert-Davila, Wolfram, Vivas-Ruiz, Dan E, Keegan, Ronan M and Rigden, Daniel J ORCID: 0000-0002-7565-8937
(2022) findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM. IUCrJ, 9 (1). pp. 86-97.

Access the full-text of this item by clicking on the Open Access link.

Abstract

<jats:p>Although experimental protein-structure determination usually targets known proteins, chains of unknown sequence are often encountered. They can be purified from natural sources, appear as an unexpected fragment of a well characterized protein or appear as a contaminant. Regardless of the source of the problem, the unknown protein always requires characterization. Here, an automated pipeline is presented for the identification of protein sequences from cryo-EM reconstructions and crystallographic data. The method's application to characterize the crystal structure of an unknown protein purified from a snake venom is presented. It is also shown that the approach can be successfully applied to the identification of protein sequences and validation of sequence assignments in cryo-EM protein structures.</jats:p>

Item Type: Article
Uncontrolled Keywords: protein structures, protein sequences, SIMBAD, cryo-EM, bioinformatics, structure determination, findMySequence, neural networks
Divisions: Faculty of Health and Life Sciences
Faculty of Health and Life Sciences > Institute of Systems, Molecular and Integrative Biology
Depositing User: Symplectic Admin
Date Deposited: 31 Jan 2022 09:12
Last Modified: 18 Jan 2023 21:14
DOI: 10.1107/s2052252521011088
Open Access URL: https://journals.iucr.org/m/issues/2022/01/00/pw50...
Related URLs:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3147829