Using deep-learning predictions reveals a large number of register errors in PDB depositions.



Sánchez Rodríguez, Filomeno, Simpkin, Adam J ORCID: 0000-0003-1883-9376, Chojnowski, Grzegorz, Keegan, Ronan M ORCID: 0000-0002-9495-0431 and Rigden, Daniel J ORCID: 0000-0002-7565-8937
(2024) Using deep-learning predictions reveals a large number of register errors in PDB depositions. IUCrJ, 11 (Pt 6). pp. 938-950. ISSN 2052-2525, 2052-2525

Access the full-text of this item by clicking on the Open Access link.

Abstract

The accuracy of the information in the Protein Data Bank (PDB) is of great importance for the myriad downstream applications that make use of protein structural information. Despite best efforts, the occasional introduction of errors is inevitable, especially where the experimental data are of limited resolution. A novel protein structure validation approach based on spotting inconsistencies between the residue contacts and distances observed in a structural model and those computationally predicted by methods such as AlphaFold2 has previously been established. It is particularly well suited to the detection of register errors. Importantly, this new approach is orthogonal to traditional methods based on stereochemistry or map-model agreement, and is resolution independent. Here, thousands of likely register errors are identified by scanning 3-5 Å resolution structures in the PDB. Unlike most methods, the application of this approach yields suggested corrections to the register of affected regions, which it is shown, even by limited implementation, lead to improved refinement statistics in the vast majority of cases. A few limitations and confounding factors such as fold-switching proteins are characterized, but this approach is expected to have broad application in spotting potential issues in current accessions and, through its implementation and distribution in CCP4, helping to ensure the accuracy of future depositions.

Item Type: Article
Uncontrolled Keywords: Proteins, Protein Conformation, Models, Molecular, Databases, Protein, Deep Learning
Divisions: Faculty of Health & Life Sciences
Faculty of Health & Life Sciences > Inst. Systems, Molec & Integrative Biology > Inst. Systems, Molec & Integrative Biology
Depositing User: Symplectic Admin
Date Deposited: 01 Nov 2024 14:49
Last Modified: 20 Dec 2025 21:33
DOI: 10.1107/s2052252524009114
Open Access URL: https://doi.org/10.1107/S2052252524009114
Related Websites:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3187006
Disclaimer: The University of Liverpool is not responsible for content contained on other websites from links within repository metadata. Please contact us if you notice anything that appears incorrect or inappropriate.