Statistical and Deep Learning Approaches for Characterizing RNA Modification with respect to RNA Sequence, Functional Component and Gene Structure



Wang, Yue
(2023) Statistical and Deep Learning Approaches for Characterizing RNA Modification with respect to RNA Sequence, Functional Component and Gene Structure. PhD thesis, University of Liverpool.

[img] Text
201334709.pdf - Author Accepted Manuscript
Access to this file is embargoed until 1 August 2026.

Download (3MB)

Abstract

As one of the most fundamental mechanisms for regulating gene expression in organisms, RNA modification is universally found in viral, prokaryotic and eukaryotic species. The development of high-throughput techniques enables the profiling of genome that harbors biological features of interest on a massive scale, making it possible for researchers to identify the specific location of diverse RNA modification types, study their association with other genomic features or regions, and explore their molecular functions and related regulatory circuitry. However, a fundamental limitation of existing approaches is that they were designed primarily for genome-based analysis and thus failed to accommodate the transcriptome heterogeneity. i.e., isoform-specific belongings of RNA features might be unavailable in the presence of multiple isoforms of the same gene, which could induce biases when analyzing these transcriptome features that includes RNA modification sites. The primary aim of this project was to develop statistical and deep learning approaches to characterization of RNA-related genomic features, specifically RNA modifications, in the presence of isoform heterogeneity and ambiguity. To achieve this, three original computational methods were proposed for the colocalization analysis, distribution visualization, and site prediction of transcriptome features, respectively. RgnTX was proposed as a software tool that could conduct colocalization analysis for association between transcriptome features and regions with permutation tests (Monte Carlo simulations). It offered high flexibility in the null model to simulate realistic transcriptome-wide background, and supported the testing of transcriptome elements without clear isoform belonging; MetaTX could decipher the transcriptome-wide distribution of mRNA-related features. Through a standardized mRNA model, it unified various mRNA transcripts of diverse compositions, and corrected the isoform ambiguity by incorporating the overall distribution pattern of the features through an EM algorithm; DPred provided a novel computational model built upon local self-attention mechanism and convolutional neural network, for effectively predicting dihydrouridine modifications on mRNAs from primary RNA sequences and potentially revealing their different formation mechanisms and putative divergent functionality on distinct RNA types. This project attempts to systematically summarize the problems in current studies about RNA-related genomic attributes, and develop statistical models, software tools and standardized analysis process to improve the analysis quality of RNA modification data. Related software packages and website were made freely available at the link shown at the end of corresponding chapter. It is expected that the proposed computational approaches could make standardized tools to facilitate the study of RNA modifications and other transcriptome-related features in this research field.

Item Type: Thesis (PhD)
Uncontrolled Keywords: isoform ambiguity, epi-transcriptome, colocalization analysis, maximum likelihood, EM algorithm, local self-attention
Divisions: Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User: Symplectic Admin
Date Deposited: 30 Aug 2023 14:52
Last Modified: 30 Aug 2023 14:53
DOI: 10.17638/03171078
Supervisors:
  • Su, Jionglong
  • Coenen, Frans
  • Meng, Jia
URI: https://livrepository.liverpool.ac.uk/id/eprint/3171078