Skip to main navigation Skip to search Skip to main content

Exploiting textual queries for dynamically visual disambiguation

  • Zeren Sun
  • , Yazhou Yao*
  • , Jimin Xiao
  • , Lei Zhang
  • , Jian Zhang
  • , Zhenmin Tang
  • *Corresponding author for this work
  • Nanjing University of Science and Technology
  • Northwestern Polytechnical University Xian
  • University of Technology Sydney

Research output: Contribution to journalArticlepeer-review

14 Citations (Scopus)

Abstract

Due to the high cost of manual annotation, learning directly from the web has attracted broad attention. One issue that limits the performance of current webly supervised models is the problem of visual polysemy. In this work, we present a novel framework that resolves visual polysemy by dynamically matching candidate text queries with retrieved images. Specifically, our proposed framework includes three major steps: we first discover and then dynamically select the text queries according to the keyword-based image search results, we employ the proposed saliency-guided deep multi-instance learning (MIL) network to remove outliers and learn classification models for visual disambiguation. Compared to existing methods, our proposed approach can figure out the right visual senses, adapt to dynamic changes in the search results, remove outliers, and jointly learn the classification models. Extensive experiments and ablation studies on CMU-Poly-30 and MIT-ISD datasets demonstrate the effectiveness of our proposed approach.

Original languageEnglish
Article number107620
JournalPattern Recognition
Volume110
DOIs
Publication statusPublished - Feb 2021

Keywords

  • Image search
  • Text queries
  • Visual disambiguation
  • Web images

Fingerprint

Dive into the research topics of 'Exploiting textual queries for dynamically visual disambiguation'. Together they form a unique fingerprint.

Cite this