Combinatorial data analysis for data ordering



Kostopoulos, A
(2016) Combinatorial data analysis for data ordering. PhD thesis, University of Liverpool.

[img] Text
kostopoulos_thesis_final.pdf - Unspecified

Download (20MB)

Abstract

Seriation is a combinatorial optimisation problem that aims to sequence a set of objects such that a natural ordering is created. A large variety of applications exist ranging from archaeology to bioinformatics and text mining. Initially, a thorough and useful quantitative analysis compares different seriation algorithms using the positional proximity coefficient (PPC). This analysis helps the practitioner to understand how similar two algorithms are for a given set of datasets. The first contribution is consensus seriation. This method uses the principles of other consensus based methods to combine different seriation solutions according to the PPC. As it creates a solution that no individual algorithm can create, the usefulness comes in the form of combining different structural elements from each original algorithms. In particular, it is possible to create a solution that combines the local characteristics of one algorithm together with the global characteristics of another. Experimental results show that compared to consensus ranking based methods, using the Hamming, Spearman and Kendall coefficients, the consensus seriation using the PPC gives generally superior results according to the independent accumulated relative generalised anti-Robinson events measure. The second contribution is a metaheuristic for creating good approximation solutions very large seriation problems. This adapted harmony search algorithm makes use of modified crossover operators taken from genetic algorithm literature to optimise the least-squares criterion commonly used in seriation. As for all combinatorial optimisation problems, there is a need for metaheuristics that can produce better solutions quicker. Results show that that algorithm consistently outperforms existing metaheuristic algorithms such as genetic algorithm, particle swarm optimisation, simulated annealing and tabu search as well as the genetic algorithm using the modified crossover operators, with the main advantage of creating a much superior result in a very short iteration frame. These two major contributions offer practitioners and academics with new tools to tackle seriation related problems and a suggested direction for future work is concluded.

Item Type: Thesis (PhD)
Divisions: Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User: Symplectic Admin
Date Deposited: 05 Jan 2017 10:44
Last Modified: 19 Jan 2023 07:25
DOI: 10.17638/03004631
Supervisors:
  • Goulermas, Y
URI: https://livrepository.liverpool.ac.uk/id/eprint/3004631