Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation

Tan, Zhaorui, Yang, Xi, Ye, Zihan, Wang, Qiufeng, Yan, Yuyao, Nguyen, Anh ORCID: 0000-0002-1449-211X and Huang, Kaizhu (2023) Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation. Pattern Recognition, 144. p. 109883.

Text
2210.15235 (1).pdf - Author Accepted Manuscript
Access to this file is embargoed until 14 August 2024.
Download (20MB)

Official URL: http://dx.doi.org/10.1016/j.patcog.2023.109883

Abstract

Generating high-quality images from text remains a challenge in visual-language understanding, with text-image consistency being a major concern. Particularly, the most popular metric R-precision may not accurately reflect the text-image consistency, leading to misleading semantics in generated images. Albeit its significance, designing a better text-image consistency metric surprisingly remains under-explored in the community. In this paper, we make a further step forward to develop a novel CLIP-based metric, Semantic Similarity Distance (SSD), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets. We also introduce Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN), which use two novel components to mitigate inconsistent semantics and bridge the text-image semantic gap. A series of experiments indicate that, under the guidance of SSD, our developed PDF-GAN can induce remarkable enhancements in the consistency between texts and images while preserving acceptable image quality over the CUB and COCO datasets.

Item Type:	Article
Uncontrolled Keywords:	Text-to-image, Image generation, Generative adversarial networks, Semantic consistency
Divisions:	Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User:	Symplectic Admin
Date Deposited:	04 Sep 2023 08:56
Last Modified:	18 Oct 2023 18:11
DOI:	10.1016/j.patcog.2023.109883
Related URLs:	Author Publisher
URI:	https://livrepository.liverpool.ac.uk/id/eprint/3172512