TY - GEN
T1 - Image Captioning using Adversarial Networks and Reinforcement Learning
AU - Yan, Shiyang
AU - Wu, Fangyu
AU - Smith, Jeremy S.
AU - Lu, Wenjin
AU - Zhang, Bailing
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/11/26
Y1 - 2018/11/26
N2 - Image captioning is a significant task in artificial intelligence which connects computer vision and natural language processing. With the rapid development of deep learning, the sequence to sequence model with attention, has become one of the main approaches for the task of image captioning. Nevertheless, a significant issue exists in the current framework: The exposure bias problem of Maximum Likelihood Estimation (MLE) in the sequence model. To address this problem, we use generative adversarial networks (GANs) for image captioning, which compensates for the exposure bias problem of MLE and also can generate more realistic captions. GANs, however, cannot be directly applied to a discrete task, like language processing, due to the discontinuity of the data. Hence, we use a reinforcement learning (RL) technique to estimate the gradients for the network. Also, to obtain the intermediate rewards during the process of language generation, a Monte Carlo roll-out sampling method is utilized. Experimental results on the COCO dataset validate the improved effect from each ingredient of the proposed model. The overall effectiveness is also evaluated.
AB - Image captioning is a significant task in artificial intelligence which connects computer vision and natural language processing. With the rapid development of deep learning, the sequence to sequence model with attention, has become one of the main approaches for the task of image captioning. Nevertheless, a significant issue exists in the current framework: The exposure bias problem of Maximum Likelihood Estimation (MLE) in the sequence model. To address this problem, we use generative adversarial networks (GANs) for image captioning, which compensates for the exposure bias problem of MLE and also can generate more realistic captions. GANs, however, cannot be directly applied to a discrete task, like language processing, due to the discontinuity of the data. Hence, we use a reinforcement learning (RL) technique to estimate the gradients for the network. Also, to obtain the intermediate rewards during the process of language generation, a Monte Carlo roll-out sampling method is utilized. Experimental results on the COCO dataset validate the improved effect from each ingredient of the proposed model. The overall effectiveness is also evaluated.
UR - http://www.scopus.com/inward/record.url?scp=85059743054&partnerID=8YFLogxK
U2 - 10.1109/ICPR.2018.8545049
DO - 10.1109/ICPR.2018.8545049
M3 - Conference Proceeding
AN - SCOPUS:85059743054
T3 - Proceedings - International Conference on Pattern Recognition
SP - 248
EP - 253
BT - 2018 24th International Conference on Pattern Recognition, ICPR 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 24th International Conference on Pattern Recognition, ICPR 2018
Y2 - 20 August 2018 through 24 August 2018
ER -