Machine learning enabled genetic and functional interpretation of the epitranscriptome

Song, Bowen
(2022) Machine learning enabled genetic and functional interpretation of the epitranscriptome. PhD thesis, University of Liverpool.

[img] Text
Accepted Thesis_Bowen Song_201395062.pdf - Author Accepted Manuscript
Access to this file is embargoed until 1 January 2025.

Download (4MB)


Increasing evidence has suggested that RNA modifications regulate many important biological processes. To date, more than 170 types of post-transcriptional RNA modifications have been discovered. With recent advances in sequencing techniques, tens of thousands of modification sites are identified in a typical high-throughput experiment, posing a key challenge to distinguish the functional modified sites from the remaining ‘passenger’ (or ‘silent’) sites. To ensure that the massive epitranscriptome datasets are properly taken advantage of, annotated, and shared, bioinformatics solutions are developed with various focuses. In this thesis, we first described a comparative conservation analysis of the human and mouse m6A epitranscriptome at single-site resolution. A novel scoring framework, ConsRM, was devised to quantitatively measure the degree of conservation of individual m6A sites. ConsRM integrates multiple information sources and a positive-unlabeled learning framework, which integrated genomic and sequence features to trace subtle hints of epitranscriptome layer conservation. With a series of validation experiments in mouse, fly and zebrafish, we showed that ConsRM outperformed well-adopted conservation scores (phastCons and phyloP) in distinguishing the conserved and non-conserved m6A sites. Additionally, the m6A sites with a higher ConsRM score are more likely to be functionally important. To further unveil the functional epitranscriptome, we investigated the potential influence of genetic factors on epitranscriptome disturbance. Recent studies have found close associations between RNA modifications and multiple pathophysiological disorders, the precise identification and large-scale prediction of disease-related modification sites can truly contribute to understanding potential disease mechanisms. Consequently, we developed a computational pipeline to systemically identify RNA modification-associated variants and their affected modification regions, with emphasis on their disease- and trait-associations. Furthermore, we described the next research considering the dynamics of RNA methylome across different tissues by elucidating the tissue-specific impact of the somatic variant on m6A methylation. The TCGA cancer mutations (derived from 27 cancer types) that may lead to the gain or loss of m6A sites in corresponding cancer-originating tissues were systemically evaluated and collected. Token together, the proposed bioinformatics pipelines and databases should serve as useful resources for functional discrimination and annotation of the massive epitranscriptome data, with implications for the potential disease mechanisms functioning through epitranscriptome layer.

Item Type: Thesis (PhD)
Divisions: Faculty of Health and Life Sciences
Depositing User: Symplectic Admin
Date Deposited: 19 Jan 2023 10:08
Last Modified: 19 Jan 2023 10:08
DOI: 10.17638/03165888