Bayesian approaches of mixture copulas with applications



Liu, Yujian
(2023) Bayesian approaches of mixture copulas with applications. PhD thesis, University of Liverpool.

[img] Text
201549780_Jan2024.pdf - Author Accepted Manuscript

Download (20MB) | Preview

Abstract

Copula theory has become one of the most important ideologies and methodologies for modeling the dependence among random variables. Rather than using point performance metrics such as Pearson linear correlation, copula functions enable us to construct the multivariate distributions among the concerned random variables by starting from the corresponding marginal distributions. Hence, it gives us a full description of the dependence mode. The most frequently used copula models are parametric copulas such as Gaussian, Clayton, and Gumbel copulas. However, in many practical scenarios, these copulas often fail to fully describe the dependence as real data often contain complex patterns with multi-modals. In addition, classic copulas are mostly studied in their bivariate form, leaving the application of copulas into higher dimensional data non-trivial. This thesis intends to approach the above-mentioned problems by utilizing Bayesian samplers into mixture copulas. In particular, we study the problems of estimating, selecting, and simulating mixture components of copulas by using Bayesian approaches. Families of multivariate elliptical and skew-elliptical copulas are given special attention as they can be naturally extended to higher dimensions. For applications, we apply our proposed approaches to study the dependence among financial markets. Meanwhile, we extend the application of our Bayesian mixture copulas to improve the oversampling methods for imbalance learning problems in the field of data science. The thesis mainly consists of four major parts. In the first part, we applied the Bayesian sparse finite mixture model to the copula mixture modeling, which enables us to estimate and select the correct finite mixture copulas simultaneously without having to repeatedly estimate various forms of models and compare their AICs or BICs. The second part focused on the construction of infinite mixture t copulas using the Dirichlet process prior. Although we are concentrated on the t copulas due to their usefulness in financial applications. This approach can be extended to more general copulas. The approaches further advance the previously proposed finite mixture Bayesian approaches despite being more complicated in terms of modeling. The third part further extends previous parts to construct the non-parametric Bayesian copula mixture models for serially correlated data. In particular, we discuss the modeling of the hidden Markov models (HMM) with multivariate emission distributions. We use copula theories to decompose the construction of multivariate emission distributions into univariate marginal distributions and a dependence structure. Meanwhile, many real-life applications of HMM have an unknown number of states, which need to be manually specified by analysts if the classic HMM method is used. Introducing the hierarchical Dirichlet process into the Copula-HMM model enables us to infer the number of unknown states from the dataset automatically. We thoroughly introduce the inference method of this non-parametric Bayesian copula-HMM model therein. The final part is about the introduction and study of the evaluation metrics of imbalance learning problems as well as applying the mixture copulas approach to solving the data imbalance. One major obstacle of applying the copulas approach to imbalanced datasets is the high dimensional features of many tasks. On the other hand, data science applications often include features that are discrete-valued, while most of the copulas literature only deals with continuous random vectors. Therefore, we develop the MCMC approaches for estimating the mixed valued copulas (i.e., the copula contains both continuous and discrete valued variables) and apply them to estimate the dataset and perform the oversampling. The Bayesian approach would be useful in these tasks as the real applications often involve high dimensional large dataset, whereas the classic MLE approaches struggle in this case due to the exponential complexity in evaluating the discrete dimensions. The approaches are applied to the simulated dataset to prove its validity in the paper. Meanwhile, the real oversampling task is performed using mixture copulas, and the results are compared with the classic random oversampling and the SMOTE approaches.

Item Type: Thesis (PhD)
Uncontrolled Keywords: copula theory, mixture copula models, Bayesian methods, MCMC, nonparametric Bayesian approach
Divisions: Faculty of Science and Engineering > School of Physical Sciences
Depositing User: Symplectic Admin
Date Deposited: 13 Feb 2024 15:32
Last Modified: 13 Feb 2024 15:33
DOI: 10.17638/03177731
Supervisors:
  • Xie, Dejun
URI: https://livrepository.liverpool.ac.uk/id/eprint/3177731