Disease Surveillance using Bayesian Methods



Rosato, Conor ORCID: 0000-0001-8394-7344
(2023) Disease Surveillance using Bayesian Methods. Doctor of Philosophy thesis, University of Liverpool.

[thumbnail of 201116060_Oct2023.pdf] Text
201116060_Oct2023.pdf - Author Accepted Manuscript

Download (5MB) | Preview

Abstract

Developing Markov Chain Monte Carlo (MCMC) algorithms has been an active area of research. Extensions of the original Metropolis-Hastings random walk (MHRW) algorithm, such as Metropolis-adjusted Langevin algorithm (MALA), Hamiltonian Monte Carlo (HMC) and the No-U-Turn Sampler (NUTS), include gradient information about the posterior when proposing parameters in areas of higher probability within the target. Particle- Markov Chain Monte Carlo (p-MCMC) is a similar parameter estimation algorithm that utilises a particle filter to calculate an unbiased estimate of the log-likelihood which can be used in the MHRW algorithm. However, as noted in the literature, obtaining gradients of the log-likelihood w.r.t the parameters is difficult due to operations inherent to the particle filter being non-differentiable. This obstacle has hindered the use of gradient based proposals within p-MCMC. Therefore, in this thesis, a novel method for obtaining the gradient of the log-likelihood w.r.t the parameters by fixing the random number seed within the particle filter is considered. This allows the particle filter to be posed as a deterministic function, i.e. running the particle filter multiple times will result in the same resampling realisations, log-likelihood and associated gradient estimates. When a different resampling realisation occurs between two parameter values, a piecewise continuous estimate of the log-likelihood and gradient occurs. It is shown that these estimates are still compatible with gradient based proposals such as MALA, HMC and NUTS. A comparison of these samplers is made when estimating the parameters of two state-space models. Results indicate that although NUTS can make multiple gradient evaluations per MCMC iteration, it can produce more accurate estimates in shorter computation time. Frameworks for describing the differentiable particle filter and NUTS in PyTorch and PyMC3, respectively are also provided. This allows the derivatives and partial derivatives to be calculated via automatic differentiation. Particle filters have been used extensively to model and track infectious disease epidemics, with p-MCMC used to estimate the parameters of these models. Although gradient based proposals are used in non-particle methods when modelling epidemiology, the standard proposal when using p-MCMC is the MHRW. Applying the novel differentiable particle filter to two epidemiological models, NUTS can recover the correct parameters in shorter run time when compared to the MHRW proposal. In the context of epidemiological modelling it is essential for public health officials to understand how a disease spreads through a population. This has recently come to the forefront with the emergence of COVID-19. At the beginning of the pandemic it was vital to gather accurate open-source datasets from which to infer how quickly the virus was spreading. As well as parameter estimation, MCMC algorithms have the ability to make forecasts of quantities of interest. Evaluating these predictions with simple scoring rules gives an indication of how well the model represents reality. The scoring rule normalised estimation error squared (NEES) can detect shortcomings within a model such as incorrect parameters, resulting in forecasts that are over-confident or over-cautious. A detailed description of why being cautious rather than confident is more desirable is provided. NEES can also be used when evaluating the effectiveness of different open-source datasets when making future predictions. A novel machine learning framework for detecting COVID-19 symptomatic tweets in real-time in multiple languages is outlined. By collating the tweets from the previous 24 hours a time series of symptomatic tweets can be set up per geographic region. It is shown that, when compared with other traditional data sources, such as positive test results, ingesting tweet data can result in more consistent and accurate COVID-19 death predictions in the United States, United Kingdom and European and South American countries.

Item Type: Thesis (Doctor of Philosophy)
Divisions: Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User: Symplectic Admin
Date Deposited: 13 Nov 2023 16:59
Last Modified: 13 Nov 2023 16:59
DOI: 10.17638/03173609
Supervisors:
  • Maskell, Simon
URI: https://livrepository.liverpool.ac.uk/id/eprint/3173609