Distributed document and phrase co-embeddings for descriptive clustering



Sato, M, Brockmeier, AJ ORCID: 0000-0002-7293-8140, Kontonatsios, G, Mu, T, Goulermas, JY ORCID: 0000-0003-0381-124X, Tsujii, J and Ananiadou, S
(2017) Distributed document and phrase co-embeddings for descriptive clustering. .

Access the full-text of this item by clicking on the Open Access link.

Abstract

© 2017 Association for Computational Linguistics. Descriptive document clustering aims to automatically discover groups of semantically related documents and to assign a meaningful label to characterise the content of each cluster. In this paper, we present a descriptive clustering approach that employs a distributed representation model, namely the paragraph vector model, to capture semantic similarities between documents and phrases. The proposed method uses a joint representation of phrases and documents (i.e., a coembedding) to automatically select a descriptive phrase that best represents each document cluster. We evaluate our method by comparing its performance to an existing state-of-the-art descriptive clustering method that also uses co-embedding but relies on a bag-of-words representation. Results obtained on benchmark datasets demonstrate that the paragraph vector-based method obtains superior performance over the existing approach in both identifying clusters and assigning appropriate descriptive labels to them.

Item Type: Conference or Workshop Item
Depositing User: Symplectic Admin
Date Deposited: 01 Aug 2017 13:47
Last Modified: 14 Nov 2019 06:16
Open Access URL: https://aclweb.org/anthology/E/E17/E17-1093.pdf
URI: http://livrepository.liverpool.ac.uk/id/eprint/3008656
Repository Staff Access