Applied Data Science Methods in Epitranscriptomic Bioinformatics



Wei, Zhen
(2020) Applied Data Science Methods in Epitranscriptomic Bioinformatics. PhD thesis, University of Liverpool.

[img] Text
201026163_ZhenWei_2020(2).pdf - Unspecified

Download (5MB) | Preview

Abstract

Chemical modifications on messenger RNA have been recently revealed by biological researchers to function as an essential layer of gene expression regulation. Molecular biologists from different laboratories have conducted more than 200 sets of high throughput sequencing experiments trying to capture the types and locations of messenger RNA modifications across multiple cell types and species. However, until this date, the field still lacks a bioinformatics pipeline to quantify and analyze the epitranscriptomic HTS data generated from different laboratories consistently. The thesis aims to provide an overview of questions and challenges arisen in the field of mRNA modification computational analysis. Subsequently, we will present a set of practical computational strategies for data explorations, genomic data mining, modification level quantifications, and technical artifact corrections from a data science perspective. The first chapter of the thesis provides an in-depth data exploration and visualization of m5C mRNA modification from bisulfite sequencing data. In the second chapter, we document the database construction and data consistency exploration for the transcriptomic targets of the mRNA modification related protein regulators. Besides, the second chapter presents a methodological framework for the computational representation of the domain knowledge related to the transcriptomic topology of epitranscriptomic modification. The final section of the thesis discusses the dominant technical biases existed in MeRIP-Seq, the most widely applied type of HTS data in epitranscriptomics, and it follows with a practical computational pipeline to overcome the technical error.

Item Type: Thesis (PhD)
Uncontrolled Keywords: epitranscriptomics, batch effect, technical bias correction, genomic data mining, bioinformatics, data science
Divisions: Faculty of Health and Life Sciences > Institute of Life Courses and Medical Sciences
Depositing User: Symplectic Admin
Date Deposited: 05 Mar 2020 10:11
Last Modified: 19 Jan 2023 00:04
DOI: 10.17638/03073578
Supervisors:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3073578