Profiling the Human Phosphoproteome to Estimate the True Extent of Protein Phosphorylation and Phosphosite Conservation



Kalyuzhnyy, Anton
(2023) Profiling the Human Phosphoproteome to Estimate the True Extent of Protein Phosphorylation and Phosphosite Conservation. PhD thesis, University of Liverpool.

[img] Text
AntonK_Supplementary_Information.pdf - Supporting information

Download (2MB) | Preview
[img] Text
200881687_Mar2023.pdf - Author Accepted Manuscript

Download (7MB) | Preview

Abstract

Protein phosphorylation is a fundamental post-translation modification (PTM) that regulates protein function and is well-studied in relation to cell signalling pathways and disease. The development of high-throughput proteomics pipelines such as tandem mass spectrometry has led to the discovery of large numbers of specific phosphorylated protein motifs and sites, focussing primarily on the phosphorylation of serine, threonine and tyrosine amino acids. However, there is no database-level control for the false discovery of sites, likely leading to the overestimation of true phosphosites reported in phosphorylation resources. In addition, the vast majority of phosphosite discoveries are made in humans, with many other species only having a few reported phosphosites. Furthermore, only a small fraction of the currently characterised human phosphoproteome has an annotated functional role and the studies focusing on predicting the functional relevance of phosphosites on a large scale using techniques such as conservation analysis are scarce. As a result, this Thesis profiled the human phosphoproteome to estimate the true extent of protein phosphorylation and understand the evolutionary and functional trends of phosphosites. First, in Chapter 2, we developed and validated an accessible Python pipeline which can determine the conservation of specific amino acid sites such as PTMs and perform several steps of a typical conservation analysis in a single step. In particular, for each query protein sequence, the pipeline identifies its likely homologous sequences from the selected species using the BLAST algorithm, generates multiple sequence alignments and calculates the conservation of target amino acid sites. In Chapter 3, we profiled the human phosphoproteome and developed a method of independent phosphosite FDR estimation in large datasets. We ranked all reported human phosphosites into sets according to the amount of identification evidence they had in public databases and analysed the sets in terms of conservation across 100 species, sequence properties and functional annotations. We demonstrated significant differences between the sets and estimated that around 62,000 Ser, 8,000 Thr and 12,000 Tyr phosphosites in the human proteome were likely to be true, which is lower than most published estimates. Furthermore, our analysis estimated that 86,000 Ser, 50,000 Thr and 26,000 Tyr phosphosites were likely false positive identifications, highlighting the significant potential of false positive data in phosphorylation databases. In Chapter 4, we analysed the evolutionary conservation of human phosphosites across different groups of eukaryotic species and linked their conservation patterns to diverse protein functions. Finally, we applied the conservation analysis to predict over 1,000,000 potential phosphosites in eukaryotes by using confident human phosphosites as a reference set. Our results highlighted the relevance of conservation analysis in studying phosphosites and can ultimately be used to improve proteome annotations of several species.

Item Type: Thesis (PhD)
Divisions: Faculty of Health and Life Sciences
Faculty of Health and Life Sciences > Institute of Systems, Molecular and Integrative Biology
Depositing User: Symplectic Admin
Date Deposited: 20 Sep 2023 09:56
Last Modified: 20 Sep 2023 09:57
DOI: 10.17638/03171640
Supervisors:
  • Jones, Andrew
  • Eyers, Claire
URI: https://livrepository.liverpool.ac.uk/id/eprint/3171640