A Survey on VQA: Datasets and Approaches

Yeyun Zou, Qiyu Xie

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

8 Citations (Scopus)

Abstract

Visual question answering (VQA) is a task that combines both the techniques of computer vision and natural language processing. It requires models to answer a text-based question according to the information contained in a visual. In recent years, the research field of VQA has been expanded. Research that focuses on the VQA, examining the reasoning ability and VQA on scientific diagrams, has also been explored more. Meanwhile, more multimodal feature fusion mechanisms have been proposed. This paper will review and analyze existing datasets, metrics, and models proposed for the VQA task.

Original languageEnglish
Title of host publicationProceedings - 2020 2nd International Conference on Information Technology and Computer Application, ITCA 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages289-297
Number of pages9
ISBN (Electronic)9780738111414
DOIs
Publication statusPublished - Dec 2020
Externally publishedYes
Event2nd International Conference on Information Technology and Computer Application, ITCA 2020 - Guangzhou, China
Duration: 18 Dec 202020 Dec 2020

Publication series

NameProceedings - 2020 2nd International Conference on Information Technology and Computer Application, ITCA 2020

Conference

Conference2nd International Conference on Information Technology and Computer Application, ITCA 2020
Country/TerritoryChina
CityGuangzhou
Period18/12/2020/12/20

Keywords

  • component
  • computer vision
  • image retrieval
  • knowledge representation
  • Multimodal learning
  • natural language processing
  • visual question answering

Fingerprint

Dive into the research topics of 'A Survey on VQA: Datasets and Approaches'. Together they form a unique fingerprint.

Cite this