Kaneko, M and Bollegala, D
ORCID: 0000-0003-4476-7003
(2020)
Autoencoding Improves Pre-trained Word Embeddings
In: Proceedings of the 28th International Conference on Computational Linguistics, 2020-12 - 2020-12, Virtual.
|
Text
main.pdf - Author Accepted Manuscript Download (211kB) | Preview |
Abstract
Prior work investigating the geometry of pre-trained word embeddings have shown that word embeddings to be distributed in a narrow cone and by centering and projecting using principal component vectors one can increase the accuracy of a given set of pre-trained word embeddings. However, theoretically this post-processing step is equivalent to applying a linear autoencoder to minimise the squared `<inf>2</inf> reconstruction error. This result contradicts prior work (Mu and Viswanath, 2018) that proposed to remove the top principal components from pre-trained embeddings. We experimentally verify our theoretical claims and show that retaining the top principal components is indeed useful for improving pre-trained word embeddings, without requiring access to additional linguistic resources or labeled data.
| Item Type: | Conference Item (Unspecified) |
|---|---|
| Depositing User: | Symplectic Admin |
| Date Deposited: | 03 Nov 2020 10:26 |
| Last Modified: | 24 Jan 2026 02:44 |
| DOI: | 10.18653/v1/2020.coling-main.149 |
| Related Websites: | |
| URI: | https://livrepository.liverpool.ac.uk/id/eprint/3105624 |
| Disclaimer: | The University of Liverpool is not responsible for content contained on other websites from links within repository metadata. Please contact us if you notice anything that appears incorrect or inappropriate. |
Altmetric
Altmetric