Improving image caption performance with linguistic context

Yupeng Cao, Qiu Feng Wang*, Kaizhu Huang, Rui Zhang

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

1 Citation (Scopus)

Abstract

Image caption aims to generate a description of an image by using techniques of computer vision and natural language processing, where the framework of Convolutional Neural Networks (CNN) followed by Recurrent Neural Networks (RNN) or particularly LSTM, is widely used. In recent years, the attention-based CNN-LSTM networks attain the significant progress due to their ability of modelling global context. However, CNN-LSTMs do not consider the linguistic context explicitly, which is very useful in further boosting the performance. To overcome this issue, we proposed a method that integrate a n-gram model in the attention-based image caption framework, managing to model the word transition probability in the decoding process for enhancing the linguistic context of translation results. We evaluated the performance of BLEU on the benchmark dataset of MSCOCO 2014. Experimental results show the effectiveness of the proposed method. Specifically, the performance of BLEU-1, BLEU-2, BLEU-3 BLEU-4, and METEOR is improved by 0.2%, 0.7%, 0.6%, 0.5%, and 0.1, respectively.

Original languageEnglish
Title of host publicationAdvances in Brain Inspired Cognitive Systems - 10th International Conference, BICS 2019, Proceedings
EditorsJinchang Ren, Amir Hussain, Huimin Zhao, Jun Cai, Rongjun Chen, Yinyin Xiao, Kaizhu Huang, Jiangbin Zheng
PublisherSpringer
Pages3-11
Number of pages9
ISBN (Print)9783030394301
DOIs
Publication statusPublished - 2020
Event10th International Conference on Brain Inspired Cognitive Systems, BICS 2019 - Guangzhou, China
Duration: 13 Jul 201914 Jul 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11691 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th International Conference on Brain Inspired Cognitive Systems, BICS 2019
Country/TerritoryChina
CityGuangzhou
Period13/07/1914/07/19

Keywords

  • Attention-base model
  • CNN
  • Image caption
  • Linguistic context
  • Long-short term memory

Cite this