Profiling the Human Phosphoproteome to Estimate the True Extent of Protein Phosphorylation

Kalyuzhnyy, Anton, Eyers, Patrick ORCID: 0000-0002-9220-2966, Eyers, Claire ORCID: 0000-0002-3223-5926, Sun, Zhi, Deutsch, Eric ORCID: 0000-0001-8732-0928 and Jones, Andrew ORCID: 0000-0001-6118-9327
(2021) Profiling the Human Phosphoproteome to Estimate the True Extent of Protein Phosphorylation. [Preprint]

[img] PDF
Kalyuzhnyy FLR global phospho acs.jproteome.2c00131.pdf - Published version

Download (5MB) | Preview


Mass spectrometry-based phosphoproteomics allows large-scale generation of phosphorylation site data. However, analytical pipelines need to be carefully designed and optimised to minimise incorrect identification of phosphopeptide sequences or wrong localisation of phosphorylation sites within those peptides. Public databases such as PhosphoSitePlus (PSP) and PeptideAtlas (PA) compile results from published papers or openly available MS data, but to our knowledge, there is no database-level control for false discovery of sites, subsequently leading to the likely overestimation of true phosphosites. It is therefore difficult for researchers to assess which phosphosites are “real” and which are likely to be artefacts of data processing. By profiling the human phosphoproteome, we aimed to estimate the false discovery rate (FDR) of phosphosites based on available evidence in PSP and/or PA and predict a more realistic count of true phosphosites. We ranked sites into phosphorylation likelihood sets based on layers of accumulated evidence and then analysed them in terms of amino acid conservation across 100 species, sequence properties and functional annotations of associated proteins. We demonstrated significant differences between the sets and developed a method for independent phosphosite FDR estimation. Remarkably, we estimated a false discovery rate of 86.1%, 95.4% and 82.2% within sets of described phosphoserine (pSer), phosphothreonine (pThr) and phosphotyrosine (pTyr) sites respectively for which only a single piece of identification evidence is available (the vast majority of sites in PSP). Overall, we estimate that ∼56,000 Ser, 10,000 Thr and 12,000 Tyr phosphosites in the human proteome have truly been identified to date, based on evidence in PSP and/or PA, which is lower than most published estimates. Furthermore, our analysis estimated ∼91,000 Ser, 49,000 Thr and 26,000 Tyr sites that are likely to represent false-positive phosphosite identifications. We conclude that researchers should be aware of the significant potential for false positive sites to be present in public databases and should evaluate the evidence behind the phosphosites used in their research.

Item Type: Preprint
Divisions: Faculty of Health and Life Sciences
Faculty of Health and Life Sciences > Institute of Systems, Molecular and Integrative Biology
Depositing User: Symplectic Admin
Date Deposited: 21 Oct 2022 08:32
Last Modified: 18 Jan 2023 19:49
DOI: 10.1101/2021.04.14.439901
Related URLs: