Evaluating the Similarity of Location-based Corpora Identified in Reddit Comments



Berragan, C ORCID: 0000-0003-2198-2245, Singleton, A ORCID: 0000-0002-2338-2334, Calafiore, A ORCID: 0000-0002-5953-2891 and Morley, J ORCID: 0000-0002-3658-8796
(2023) Evaluating the Similarity of Location-based Corpora Identified in Reddit Comments. .

Access the full-text of this item by clicking on the Open Access link.

Abstract

Social interaction is typically studied from the context of physical movement, where geographic distance and ease of connectivity influence the strength of interaction between regions. From the point of view of social media networks however, these limitations appear to still persist, despite interactions not being reliant on physical movement, suggesting non-physical geographic characteristics influence interaction between social communities. Unlike geotags, which provide explicit geographic information about social media users as coordinates, unstructured text presents an alternative perspective for the study of social interaction between regions, instead allowing for the comparison between the language used when mentioning locations in context. Our paper analyses the corpora associated with major cities across the UK, first vectorising Reddit comments through transformer-based embeddings, which capture semantic information, then using these to establish unsupervised clusters and similarity between them. We find that distinct groups emerge which broadly conform with established regional identities of locations across the UK, but with interesting deviations.

Item Type: Conference or Workshop Item (Unspecified)
Divisions: Faculty of Science and Engineering > School of Environmental Sciences
Depositing User: Symplectic Admin
Date Deposited: 21 Sep 2023 13:42
Last Modified: 21 Sep 2023 13:43
Open Access URL: https://ceur-ws.org/Vol-3385/paper1.pdf
URI: https://livrepository.liverpool.ac.uk/id/eprint/3172944