The Importance of Statistical Measure when Describing Phenotype



Hajne, Joanna
(2015) The Importance of Statistical Measure when Describing Phenotype. PhD thesis, University of Liverpool.

[img] Text
HajneJoa_Jun2015_2042199.pdf - Unspecified
Access to this file is embargoed until Unspecified.
Available under License Creative Commons Attribution.

Download (25MB)
[img] Text
HajneJoa_Jun2015_2042199.pdf - Unspecified
Available under License Creative Commons Attribution.

Download (25MB)

Abstract

Data collected in life sciences studies mostly include a genotype description of the organism, a phenotype characterisation of the organism, and experiment-specific covariates including a description of experimental procedures and laboratory (environmental) conditions. Here, phenotype measurements are taken for Neurospora crassa (wild type) growing on agar in the standard laboratory conditions. I define a phenotype as a set of traits including apical extension velocity, branching angle, and branching distance. I use the above measures (traits) to model (estimate) biologically complex filamentous fungi network as a simplified 'In Silico Fungus' consisting of series of straight lines. Phenotype data, under the central limit theorem, is often characterized by means and standard deviations. Subsequently, P values are used to show statistical validity. Here, I question whether making normality assumption based on the popularity of such approach is always justified. Therefore, I test three different scenarios by making different assumptions about the data collected. (1) Firstly, I use the most popular approach: I assume the phenotype data comes from the continuous, normal (Gauss) distribution. Thus, I predict the future measurement outcomes by using normal (Gauss) parametric approximation. (2) Secondly, I use the most intuitive approach: I do not make any assumptions about the data collected and use it to predict the future measurement outcomes by withdrawing values pseudo randomly from the actual, raw, and discrete dataset. (3) Finally, I use the strategy balanced between the previous two: I construct a customised, continuous, and non-parametric distribution based on the data collected. Thus, I predict the future measurement outcomes by using kernel density estimation method. Subsequently, I implement all of the strategies above: (1), (2), and (3) in the in silico fungus programme to compare the computer simulation outcomes. More specifically, I compare the surface coverage, expressed as the proportion of the surface occupied by the fungus. Obtained results show that the differences between different data regimes (1), (2), and (3) are significant. Therefore, I conclude that the correct assessment of the data normality is crucial for the correct interpretation and implementation of scientific observations. I suspect the described data classification process determines successful implementation of biological findings especially in the fields such as medicine and engineering.

Item Type: Thesis (PhD)
Additional Information: Date: 2015-06 (completed)
Subjects: ?? Q1 ??
?? QA ??
?? QA75 ??
?? QA76 ??
?? QD ??
?? QR ??
?? RZ ??
?? T1 ??
?? TA ??
Divisions: Faculty of Health and Life Sciences > Faculty of Health and Life Sciences
Depositing User: Symplectic Admin
Date Deposited: 01 Sep 2016 11:01
Last Modified: 17 Dec 2022 02:09
DOI: 10.17638/02042199
URI: https://livrepository.liverpool.ac.uk/id/eprint/2042199