Sato, Motoki, Brockmeier, Austin J ORCID: 0000-0002-7293-8140, Kontonatsios, Georgios, Mu, Tingting, Goulermas, John Y, Tsujii, Jun'ichi and Ananiadou, Sophia
(2017)
Distributed Document and Phrase Co-embeddings for Descriptive Clustering.
In: Proceedings of the 15th Conference of the European Chapter of the
Association for Computational Linguistics: Volume 1, Long Papers, 2017-4 - 2017-4.
Abstract
Descriptive document clustering aims to automatically discover groups of semantically related documents and to assign a meaningful label to characterise the content of each cluster. In this paper, we present a descriptive clustering approach that employs a distributed representation model, namely the paragraph vector model, to capture semantic similarities between documents and phrases. The proposed method uses a joint representation of phrases and documents (i.e., a coembedding) to automatically select a descriptive phrase that best represents each document cluster. We evaluate our method by comparing its performance to an existing state-of-the-art descriptive clustering method that also uses co-embedding but relies on a bag-of-words representation. Results obtained on benchmark datasets demonstrate that the paragraph vector-based method obtains superior performance over the existing approach in both identifying clusters and assigning appropriate descriptive labels to them.
Item Type: | Conference or Workshop Item (Unspecified) |
---|---|
Depositing User: | Symplectic Admin |
Date Deposited: | 01 Aug 2017 13:47 |
Last Modified: | 19 Jan 2023 06:58 |
DOI: | 10.18653/v1/e17-1093 |
Open Access URL: | https://aclweb.org/anthology/E/E17/E17-1093.pdf |
Related URLs: | |
URI: | https://livrepository.liverpool.ac.uk/id/eprint/3008656 |