Distributed graph partitioning to optimise collaborative filtering algorithms



Shahzad, Ahmad
(2022) Distributed graph partitioning to optimise collaborative filtering algorithms. PhD thesis, University of Liverpool.

[img] Text
200962182_Dec202.pdf - Author Accepted Manuscript
Access to this file is embargoed until 1 August 2027.

Download (2MB)

Abstract

Graphs provide intuitive abstraction for a lot of the problems. But with advent of data getting bigger it is becoming increasingly difficult to operate big data on a single machine. Hence, big data requires distributed computing which means data could be split across multiple machines. However, that requires classical graph based algorithms and techniques, which are designed for a single machine, to be applied in a distributed manner. One area where this big data challenge is demonstrated is in the gaming industry where millions of players need to be matched together in real-time using collaborative filtering algorithms. Model based collaborative filtering algorithms are slower than memory based collaborative filtering algorithms, but have higher accuracy in terms of prediction and matching. Therefore graph based algorithms and graph partitioning techniques must be devised so that model collaborative filtering can be applied. One way to improve the efficiency of such algorithms is to distribute the data on a cluster of machines so that these algorithms could be executed in almost real-time. The research presented in this thesis is directed at issues related to graph partitioning to improve the efficiency of collaborative filtering algorithms by improving multi-level graph partitioning, spectral graph partitioning and partitioning based on minimum spanning trees, using publicly available data-set movie-lens and synthetically generated data-set based of driveClub game. Synthetically generated data-set is used in absence of actual data-set for driveClub game. However, publicly available data-set of movie-lens has been used to demonstrate the generation of synthetic data-set is representative with respect to original data-set when used for collaborative filtering algorithms.

Item Type: Thesis (PhD)
Divisions: Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User: Symplectic Admin
Date Deposited: 25 Aug 2023 12:41
Last Modified: 25 Aug 2023 12:42
DOI: 10.17638/03166710
Supervisors:
  • Coenen, Frans
  • Thiyagalingam, Jeyarajan
URI: https://livrepository.liverpool.ac.uk/id/eprint/3166710