Fain, Mikhail, Twomey, Niall, Ponikar, Andrey, Fox, Ryan and Bollegala, Danushka
(2019)
Dividing and Conquering Cross-Modal Recipe Retrieval: from Nearest
Neighbours Baselines to SoTA.
CoRR, abs/19.
Text
1911.12763v1.pdf - Submitted version Download (611kB) | Preview |
Abstract
We propose a novel non-parametric method for cross-modal recipe retrieval which is applied on top of precomputed image and text embeddings. By combining our method with standard approaches for building image and text encoders, trained independently with a self-supervised classification objective, we create a baseline model which outperforms most existing methods on a challenging image-to-recipe task. We also use our method for comparing image and text encoders trained using different modern approaches, thus addressing the issues hindering the development of novel methods for cross-modal recipe retrieval. We demonstrate how to use the insights from model comparison and extend our baseline model with standard triplet loss that improves state-of-the-art on the Recipe1M dataset by a large margin, while using only precomputed features and with much less complexity than existing methods. Further, our approach readily generalizes beyond recipe retrieval to other challenging domains, achieving state-of-the-art performance on Politics and GoodNews cross-modal retrieval tasks.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | cs.CV, cs.CV |
Depositing User: | Symplectic Admin |
Date Deposited: | 11 Dec 2019 13:31 |
Last Modified: | 19 Jan 2023 00:13 |
Related URLs: | |
URI: | https://livrepository.liverpool.ac.uk/id/eprint/3065915 |