Particle swarm Optimized Density-based Clustering and Classification: Supervised and unsupervised learning approaches



Guan, Chun, Yuen, Kevin Kam Fung ORCID: 0000-0003-1497-2575 and Coenen, Frans ORCID: 0000-0003-1026-6649
(2019) Particle swarm Optimized Density-based Clustering and Classification: Supervised and unsupervised learning approaches. Swarm and Evolutionary Computation, 44. pp. 876-896.

[img] Text
2018_SWEVO_Particle swarm Optimized Density-based Clustering and Classification.pdf - Published version

Download (12MB)

Abstract

Two pattern recognition technologies in the field of machine learning, clustering and classification, have been applied in many domains. Density-based clustering is an essential clustering algorithm. The best known density-based clustering method is Density-Based Spatial Clustering of Applications with Noise (DBSCAN), which can find arbitrary shaped clusters in datasets. DBSCAN has three drawbacks: firstly, the parameters for DBSCAN are hard to set; secondly, the number of clusters cannot be controlled by the users; and thirdly, DBSCAN cannot directly be used as a classifier. In this paper a novel Particle swarm Optimized Density-based Clustering and Classification (PODCC) is proposed, designed to offset the drawbacks of DBSCAN. Particle Swarm Optimization (PSO), a widely used Evolutionary and Swarm Algorithm (ESA), has been applied in optimization problems in different research domains including data analytics. In PODCC, a variant of PSO, SPSO-2011, is used to search the parameter space so as to identify the best parameters for density-based clustering and classification. PODCC can function in terms of both Supervised and Unsupervised Learnings by applying the appropriate fitness functions proposed in this paper. With the proposed fitness function, users can set the number of clusters as input for PODCC. The proposed method was evaluated by testing ten synthetic datasets and ten benchmarking datasets selected from various open sources. The experimental results indicate that the proposed PODCC can perform better than some established methods, especially with respect to imbalanced datasets.

Item Type: Article
Uncontrolled Keywords: Networking and Information Technology R&D (NITRD)
Depositing User: Symplectic Admin
Date Deposited: 05 Dec 2018 09:54
Last Modified: 14 Mar 2024 21:45
DOI: 10.1016/j.swevo.2018.09.008
Related URLs:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3029501