Nonnegative matrix analysis for data clustering and compression



Gong, Liyun
Nonnegative matrix analysis for data clustering and compression. PhD thesis, University of Liverpool.

[thumbnail of GongLiy_Feb2015_2007800.pdf] Text
GongLiy_Feb2015_2007800.pdf - Unspecified
Available under License Creative Commons Attribution.

Download (6MB)

Abstract

Nonnegative matrix factorization (NMF) has becoming an increasingly popular data processing tool these years, widely used by various communities including computer vision, text mining and bioinformatics. It is able to approximate each data sample in a data collection by a linear combination of a set of nonnegative basis vectors weighted by nonnegative weights. This often enables meaningful interpretation of the data, motivates useful insights and facilitates tasks such as data compression, clustering and classification. These subsequently lead to various active roles of NMF in data analysis, e.g., dimensionality reduction tool [11, 75], clustering tool[94, 82, 13, 39], feature engine [40], source separation tool [38], etc. Different methods based on NMF are proposed in this thesis: The modification of k- means clustering is chosen as one of the initialisation methods for NMF. Experimental results demonstrate the excellence of this method with improved compression performance. Independent principal component analysis (IPCA) which combines the advantage of both principal component analysis (PCA) and independent component analysis (ICA) has been chosen as the significant initialisation method for NMF with improved clustering accuracy. We have proposed the new evolutionary optimization strategy for NMF driven by three proposed update schemes in the solution space, saying NMF rule (or original movement), firefly rule (or beta movement) and survival of the fittest rule (or best movement). This proposed update strategy facilitates both the clustering and compression problems by using the different system objective functions that make use of the clustering and compression quality measurements. A hybrid initialisation approach is used by including the state-of-the-art NMF initialization methods as seed knowledge to increase the rate of convergence. There is no limitation for the number and the type of the initialization methods used for the proposed optimisation approach. Numerous computer experiments using the benchmark datasets verify the theoretical results, make comparisons among the techniques in measures of clustering/compression accuracy. Experimental results demonstrate the excellence of these methods with im- proved clustering/compression performance. In the application of EEG dataset, we employed several standard algorithms to provide clustering on preprocessed EEG data. We also explored ensemble clustering to obtain some tight clusters. We can make some statements based on the results we have got: firstly, normalization is necessary for this EEG brain dataset to obtain reasonable clustering; secondly, k-means, k-medoids and HC-Ward provide relatively better clustering results; thirdly, ensemble clustering enables us to tune the tightness of the clusters so that the research can be focused.

Item Type: Thesis (PhD)
Additional Information: Date: 2015-02-09 (completed)
Subjects: ?? Q1 ??
?? T1 ??
Depositing User: Symplectic Admin
Date Deposited: 08 Sep 2015 14:27
Last Modified: 17 Dec 2022 01:40
DOI: 10.17638/02007800
URI: https://livrepository.liverpool.ac.uk/id/eprint/2007800