Image Captioning using Adversarial Networks and Reinforcement Learning

Yan, Shiyang, Wu, Fangyu, Smith, Jeremy S ORCID: 0000-0002-0212-2365, Lu, Wenjin, Zhang, Bailing and IEEE, (2018) Image Captioning using Adversarial Networks and Reinforcement Learning. In: 2018 24th International Conference on Pattern Recognition (ICPR), 2018-8-20 - 2018-8-24.

Text
bare_conf.pdf - Author Accepted Manuscript
Download (2MB)

Official URL: http://dx.doi.org/10.1109/icpr.2018.8545049

Abstract

Image captioning is a significant task in artificial intelligence which connects computer vision and natural language processing. With the rapid development of deep learning, the sequence to sequence model with attention, has become one of the main approaches for the task of image captioning. Nevertheless, a significant issue exists in the current framework: The exposure bias problem of Maximum Likelihood Estimation (MLE) in the sequence model. To address this problem, we use generative adversarial networks (GANs) for image captioning, which compensates for the exposure bias problem of MLE and also can generate more realistic captions. GANs, however, cannot be directly applied to a discrete task, like language processing, due to the discontinuity of the data. Hence, we use a reinforcement learning (RL) technique to estimate the gradients for the network. Also, to obtain the intermediate rewards during the process of language generation, a Monte Carlo roll-out sampling method is utilized. Experimental results on the COCO dataset validate the improved effect from each ingredient of the proposed model. The overall effectiveness is also evaluated.

Item Type:	Conference or Workshop Item (Unspecified)
Depositing User:	Symplectic Admin
Date Deposited:	19 Feb 2019 10:09
Last Modified:	19 Jan 2023 01:02
DOI:	10.1109/icpr.2018.8545049
Related URLs:	Author Publisher
URI:	https://livrepository.liverpool.ac.uk/id/eprint/3033097