A data mining-based approach for investigating the relationship between DNA repair genes and ageing



Freitas, Alex Alves
A data mining-based approach for investigating the relationship between DNA repair genes and ageing. Master of Philosophy thesis, University of Liverpool.

[thumbnail of FreitasAlex_Jan2011.pdf] PDF
FreitasAlex_Jan2011.pdf - Author Accepted Manuscript
Access to this file is embargoed until Unspecified.
After the embargo period this will be available under License Creative Commons Attribution No Derivatives.

Download (1MB)
[thumbnail of Renamed file] PDF (Renamed file)
FreitasAle_Jan2011_1475.pdf - Author Accepted Manuscript
Available under License Creative Commons Attribution No Derivatives.

Download (1MB)

Abstract

There is a clear motivation for ageing research, since ageing is the greatest risk factor for many diseases, including most types of cancer. Arguably, another strong motivation for ageing research is that, despite the large progress in this area in the last two decades, ageing is still to a large extent a poorly understood process, especially in humans. The vast majority of biogerontology research is still based on “wet lab” experiments done with simpler organisms, due to the problems associated with performing ageing-related experiments with humans. In contrast, this thesis proposes a data mining approach, based on classification algorithms, for analysing data about human DNA repair genes and their relationship to ageing. The classification algorithms – more precisely, decision tree induction and Naive Bayes algorithms – were applied to datasets prepared specifically for this research, by adapting and integrating data from several bioinformatics resources, namely: (a) the GenAge database of ageing-related genes; (b) a web site with a comprehensive list of human DNA repair genes; (c) Uniprot, a centralized repository of richly-annotated data about proteins; (d) the HPRD (Human Protein Reference Database); and (e) the Gene Ontology – a controlled vocabulary for describing gene or protein functions. Some experiments also used a separate dataset including gene expression data. Applying classification algorithms to such datasets aimed at producing classification models that identify which gene properties are most effective in discriminating ageing-related DNA repair genes from other types of genes – mainly non-ageing-related DNA repair genes, but in some experiments the other types of genes also included genes whose protein product interact with DNA repair genes. A related goal of this research was to analyse the automatically-built classification models from two perspectives, namely: (a) measuring the predictive accuracy (or “generalization ability”) of those models from a data mining perspective; and (b) interpreting the meaning of the main gene properties relevant for classification in those models, in the light of biological knowledge about DNA repair genes and the process of ageing. In summary, the main gene properties that were found effective in discriminating ageing-related DNA repair genes from other types of genes (mainly non-ageing-related DNA repair genes) in the datasets created in this research are as follows: ageing-related DNA repair genes’ protein products tend to interact with a considerably larger number of proteins; their protein products are much more likely to interact with WRN (a protein whose defect causes the Werner’s progeroid syndrome) and XRCC5 (KU80, a key protein in the initiation of DNA double-strand repair by the error-prone non-homologous end joining DNA repair pathway); they are more likely to be involved in response to chemical stimulus and, to a lesser extent, in response to endogenous stimulus or oxidative stress; and they are more likely to have high expression in T lymphocytes.

Item Type: Thesis (Master of Philosophy)
Additional Information: Date: 2011-01 (completed)
Uncontrolled Keywords: ageing, DNA repair, data mining, bioinformatics
Subjects: ?? QH301 ??
Divisions: Faculty of Health and Life Sciences
Depositing User: Symplectic Admin
Date Deposited: 22 Aug 2011 15:45
Last Modified: 16 Dec 2022 04:34
DOI: 10.17638/00001475
Supervisors:
  • De Magalhaes, Joao Pedro
  • Vasieva, Olga
URI: https://livrepository.liverpool.ac.uk/id/eprint/1475