TY - GEN
T1 - Improving image caption performance with linguistic context
AU - Cao, Yupeng
AU - Wang, Qiu Feng
AU - Huang, Kaizhu
AU - Zhang, Rui
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020
Y1 - 2020
N2 - Image caption aims to generate a description of an image by using techniques of computer vision and natural language processing, where the framework of Convolutional Neural Networks (CNN) followed by Recurrent Neural Networks (RNN) or particularly LSTM, is widely used. In recent years, the attention-based CNN-LSTM networks attain the significant progress due to their ability of modelling global context. However, CNN-LSTMs do not consider the linguistic context explicitly, which is very useful in further boosting the performance. To overcome this issue, we proposed a method that integrate a n-gram model in the attention-based image caption framework, managing to model the word transition probability in the decoding process for enhancing the linguistic context of translation results. We evaluated the performance of BLEU on the benchmark dataset of MSCOCO 2014. Experimental results show the effectiveness of the proposed method. Specifically, the performance of BLEU-1, BLEU-2, BLEU-3 BLEU-4, and METEOR is improved by 0.2%, 0.7%, 0.6%, 0.5%, and 0.1, respectively.
AB - Image caption aims to generate a description of an image by using techniques of computer vision and natural language processing, where the framework of Convolutional Neural Networks (CNN) followed by Recurrent Neural Networks (RNN) or particularly LSTM, is widely used. In recent years, the attention-based CNN-LSTM networks attain the significant progress due to their ability of modelling global context. However, CNN-LSTMs do not consider the linguistic context explicitly, which is very useful in further boosting the performance. To overcome this issue, we proposed a method that integrate a n-gram model in the attention-based image caption framework, managing to model the word transition probability in the decoding process for enhancing the linguistic context of translation results. We evaluated the performance of BLEU on the benchmark dataset of MSCOCO 2014. Experimental results show the effectiveness of the proposed method. Specifically, the performance of BLEU-1, BLEU-2, BLEU-3 BLEU-4, and METEOR is improved by 0.2%, 0.7%, 0.6%, 0.5%, and 0.1, respectively.
KW - Attention-base model
KW - CNN
KW - Image caption
KW - Linguistic context
KW - Long-short term memory
UR - http://www.scopus.com/inward/record.url?scp=85080862338&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-39431-8_1
DO - 10.1007/978-3-030-39431-8_1
M3 - Conference Proceeding
AN - SCOPUS:85080862338
SN - 9783030394301
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 3
EP - 11
BT - Advances in Brain Inspired Cognitive Systems - 10th International Conference, BICS 2019, Proceedings
A2 - Ren, Jinchang
A2 - Hussain, Amir
A2 - Zhao, Huimin
A2 - Cai, Jun
A2 - Chen, Rongjun
A2 - Xiao, Yinyin
A2 - Huang, Kaizhu
A2 - Zheng, Jiangbin
PB - Springer
T2 - 10th International Conference on Brain Inspired Cognitive Systems, BICS 2019
Y2 - 13 July 2019 through 14 July 2019
ER -