Image Captioning using Adversarial Networks and Reinforcement Learning

Shiyang Yan; Fangyu Wu; Jeremy S. Smith; Wenjin Lu; Bailing Zhang

doi:10.1109/ICPR.2018.8545049

Image Captioning using Adversarial Networks and Reinforcement Learning

Shiyang Yan, Fangyu Wu, Jeremy S. Smith, Wenjin Lu, Bailing Zhang

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

18 Citations (Scopus)

Abstract

Image captioning is a significant task in artificial intelligence which connects computer vision and natural language processing. With the rapid development of deep learning, the sequence to sequence model with attention, has become one of the main approaches for the task of image captioning. Nevertheless, a significant issue exists in the current framework: The exposure bias problem of Maximum Likelihood Estimation (MLE) in the sequence model. To address this problem, we use generative adversarial networks (GANs) for image captioning, which compensates for the exposure bias problem of MLE and also can generate more realistic captions. GANs, however, cannot be directly applied to a discrete task, like language processing, due to the discontinuity of the data. Hence, we use a reinforcement learning (RL) technique to estimate the gradients for the network. Also, to obtain the intermediate rewards during the process of language generation, a Monte Carlo roll-out sampling method is utilized. Experimental results on the COCO dataset validate the improved effect from each ingredient of the proposed model. The overall effectiveness is also evaluated.

Original language	English
Title of host publication	2018 24th International Conference on Pattern Recognition, ICPR 2018
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	248-253
Number of pages	6
ISBN (Electronic)	9781538637883
DOIs	https://doi.org/10.1109/ICPR.2018.8545049
Publication status	Published - 26 Nov 2018
Event	24th International Conference on Pattern Recognition, ICPR 2018 - Beijing, China Duration: 20 Aug 2018 → 24 Aug 2018

Publication series

Name	Proceedings - International Conference on Pattern Recognition
Volume	2018-August
ISSN (Print)	1051-4651

Conference

Conference	24th International Conference on Pattern Recognition, ICPR 2018
Country/Territory	China
City	Beijing
Period	20/08/18 → 24/08/18

Access to Document

10.1109/ICPR.2018.8545049

Cite this

Yan, S., Wu, F., Smith, J. S., Lu, W., & Zhang, B. (2018). Image Captioning using Adversarial Networks and Reinforcement Learning. In 2018 24th International Conference on Pattern Recognition, ICPR 2018 (pp. 248-253). Article 8545049 (Proceedings - International Conference on Pattern Recognition; Vol. 2018-August). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPR.2018.8545049

@inproceedings{706b1be64df74f68bb2556b08e8eb133,

title = "Image Captioning using Adversarial Networks and Reinforcement Learning",

abstract = "Image captioning is a significant task in artificial intelligence which connects computer vision and natural language processing. With the rapid development of deep learning, the sequence to sequence model with attention, has become one of the main approaches for the task of image captioning. Nevertheless, a significant issue exists in the current framework: The exposure bias problem of Maximum Likelihood Estimation (MLE) in the sequence model. To address this problem, we use generative adversarial networks (GANs) for image captioning, which compensates for the exposure bias problem of MLE and also can generate more realistic captions. GANs, however, cannot be directly applied to a discrete task, like language processing, due to the discontinuity of the data. Hence, we use a reinforcement learning (RL) technique to estimate the gradients for the network. Also, to obtain the intermediate rewards during the process of language generation, a Monte Carlo roll-out sampling method is utilized. Experimental results on the COCO dataset validate the improved effect from each ingredient of the proposed model. The overall effectiveness is also evaluated.",

author = "Shiyang Yan and Fangyu Wu and Smith, {Jeremy S.} and Wenjin Lu and Bailing Zhang",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 24th International Conference on Pattern Recognition, ICPR 2018 ; Conference date: 20-08-2018 Through 24-08-2018",

year = "2018",

month = nov,

day = "26",

doi = "10.1109/ICPR.2018.8545049",

language = "English",

series = "Proceedings - International Conference on Pattern Recognition",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "248--253",

booktitle = "2018 24th International Conference on Pattern Recognition, ICPR 2018",

}

Yan, S, Wu, F, Smith, JS, Lu, W & Zhang, B 2018, Image Captioning using Adversarial Networks and Reinforcement Learning. in 2018 24th International Conference on Pattern Recognition, ICPR 2018., 8545049, Proceedings - International Conference on Pattern Recognition, vol. 2018-August, Institute of Electrical and Electronics Engineers Inc., pp. 248-253, 24th International Conference on Pattern Recognition, ICPR 2018, Beijing, China, 20/08/18. https://doi.org/10.1109/ICPR.2018.8545049

Image Captioning using Adversarial Networks and Reinforcement Learning. / Yan, Shiyang; Wu, Fangyu; Smith, Jeremy S. et al.
2018 24th International Conference on Pattern Recognition, ICPR 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 248-253 8545049 (Proceedings - International Conference on Pattern Recognition; Vol. 2018-August).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - Image Captioning using Adversarial Networks and Reinforcement Learning

AU - Yan, Shiyang

AU - Wu, Fangyu

AU - Smith, Jeremy S.

AU - Lu, Wenjin

AU - Zhang, Bailing

PY - 2018/11/26

Y1 - 2018/11/26

N2 - Image captioning is a significant task in artificial intelligence which connects computer vision and natural language processing. With the rapid development of deep learning, the sequence to sequence model with attention, has become one of the main approaches for the task of image captioning. Nevertheless, a significant issue exists in the current framework: The exposure bias problem of Maximum Likelihood Estimation (MLE) in the sequence model. To address this problem, we use generative adversarial networks (GANs) for image captioning, which compensates for the exposure bias problem of MLE and also can generate more realistic captions. GANs, however, cannot be directly applied to a discrete task, like language processing, due to the discontinuity of the data. Hence, we use a reinforcement learning (RL) technique to estimate the gradients for the network. Also, to obtain the intermediate rewards during the process of language generation, a Monte Carlo roll-out sampling method is utilized. Experimental results on the COCO dataset validate the improved effect from each ingredient of the proposed model. The overall effectiveness is also evaluated.

AB - Image captioning is a significant task in artificial intelligence which connects computer vision and natural language processing. With the rapid development of deep learning, the sequence to sequence model with attention, has become one of the main approaches for the task of image captioning. Nevertheless, a significant issue exists in the current framework: The exposure bias problem of Maximum Likelihood Estimation (MLE) in the sequence model. To address this problem, we use generative adversarial networks (GANs) for image captioning, which compensates for the exposure bias problem of MLE and also can generate more realistic captions. GANs, however, cannot be directly applied to a discrete task, like language processing, due to the discontinuity of the data. Hence, we use a reinforcement learning (RL) technique to estimate the gradients for the network. Also, to obtain the intermediate rewards during the process of language generation, a Monte Carlo roll-out sampling method is utilized. Experimental results on the COCO dataset validate the improved effect from each ingredient of the proposed model. The overall effectiveness is also evaluated.

UR - http://www.scopus.com/inward/record.url?scp=85059743054&partnerID=8YFLogxK

U2 - 10.1109/ICPR.2018.8545049

DO - 10.1109/ICPR.2018.8545049

M3 - Conference Proceeding

AN - SCOPUS:85059743054

T3 - Proceedings - International Conference on Pattern Recognition

SP - 248

EP - 253

BT - 2018 24th International Conference on Pattern Recognition, ICPR 2018

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 24th International Conference on Pattern Recognition, ICPR 2018

Y2 - 20 August 2018 through 24 August 2018

ER -

Image Captioning using Adversarial Networks and Reinforcement Learning

Abstract

Publication series

Conference

Access to Document

Other files and links

Cite this