Exploiting textual queries for dynamically visual disambiguation

Zeren Sun; Yazhou Yao; Jimin Xiao; Lei Zhang; Jian Zhang; Zhenmin Tang

doi:10.1016/j.patcog.2020.107620

Exploiting textual queries for dynamically visual disambiguation

Zeren Sun, Yazhou Yao^*, Jimin Xiao, Lei Zhang, Jian Zhang, Zhenmin Tang

^*Corresponding author for this work

Department of Intelligent Science

Research output: Contribution to journal › Article › peer-review

13 Citations (Scopus)

Abstract

Due to the high cost of manual annotation, learning directly from the web has attracted broad attention. One issue that limits the performance of current webly supervised models is the problem of visual polysemy. In this work, we present a novel framework that resolves visual polysemy by dynamically matching candidate text queries with retrieved images. Specifically, our proposed framework includes three major steps: we first discover and then dynamically select the text queries according to the keyword-based image search results, we employ the proposed saliency-guided deep multi-instance learning (MIL) network to remove outliers and learn classification models for visual disambiguation. Compared to existing methods, our proposed approach can figure out the right visual senses, adapt to dynamic changes in the search results, remove outliers, and jointly learn the classification models. Extensive experiments and ablation studies on CMU-Poly-30 and MIT-ISD datasets demonstrate the effectiveness of our proposed approach.

Original language	English
Article number	107620
Journal	Pattern Recognition
Volume	110
DOIs	https://doi.org/10.1016/j.patcog.2020.107620
Publication status	Published - Feb 2021

Keywords

Image search
Text queries
Visual disambiguation
Web images

Access to Document

10.1016/j.patcog.2020.107620

Cite this

@article{2c4067f2d207408fbe0598c15a5e2d7b,

title = "Exploiting textual queries for dynamically visual disambiguation",

abstract = "Due to the high cost of manual annotation, learning directly from the web has attracted broad attention. One issue that limits the performance of current webly supervised models is the problem of visual polysemy. In this work, we present a novel framework that resolves visual polysemy by dynamically matching candidate text queries with retrieved images. Specifically, our proposed framework includes three major steps: we first discover and then dynamically select the text queries according to the keyword-based image search results, we employ the proposed saliency-guided deep multi-instance learning (MIL) network to remove outliers and learn classification models for visual disambiguation. Compared to existing methods, our proposed approach can figure out the right visual senses, adapt to dynamic changes in the search results, remove outliers, and jointly learn the classification models. Extensive experiments and ablation studies on CMU-Poly-30 and MIT-ISD datasets demonstrate the effectiveness of our proposed approach.",

keywords = "Image search, Text queries, Visual disambiguation, Web images",

author = "Zeren Sun and Yazhou Yao and Jimin Xiao and Lei Zhang and Jian Zhang and Zhenmin Tang",

note = "Publisher Copyright: {\textcopyright} 2020",

year = "2021",

month = feb,

doi = "10.1016/j.patcog.2020.107620",

language = "English",

volume = "110",

journal = "Pattern Recognition",

issn = "0031-3203",

}

TY - JOUR

T1 - Exploiting textual queries for dynamically visual disambiguation

AU - Sun, Zeren

AU - Yao, Yazhou

AU - Xiao, Jimin

AU - Zhang, Lei

AU - Zhang, Jian

AU - Tang, Zhenmin

PY - 2021/2

Y1 - 2021/2

N2 - Due to the high cost of manual annotation, learning directly from the web has attracted broad attention. One issue that limits the performance of current webly supervised models is the problem of visual polysemy. In this work, we present a novel framework that resolves visual polysemy by dynamically matching candidate text queries with retrieved images. Specifically, our proposed framework includes three major steps: we first discover and then dynamically select the text queries according to the keyword-based image search results, we employ the proposed saliency-guided deep multi-instance learning (MIL) network to remove outliers and learn classification models for visual disambiguation. Compared to existing methods, our proposed approach can figure out the right visual senses, adapt to dynamic changes in the search results, remove outliers, and jointly learn the classification models. Extensive experiments and ablation studies on CMU-Poly-30 and MIT-ISD datasets demonstrate the effectiveness of our proposed approach.

AB - Due to the high cost of manual annotation, learning directly from the web has attracted broad attention. One issue that limits the performance of current webly supervised models is the problem of visual polysemy. In this work, we present a novel framework that resolves visual polysemy by dynamically matching candidate text queries with retrieved images. Specifically, our proposed framework includes three major steps: we first discover and then dynamically select the text queries according to the keyword-based image search results, we employ the proposed saliency-guided deep multi-instance learning (MIL) network to remove outliers and learn classification models for visual disambiguation. Compared to existing methods, our proposed approach can figure out the right visual senses, adapt to dynamic changes in the search results, remove outliers, and jointly learn the classification models. Extensive experiments and ablation studies on CMU-Poly-30 and MIT-ISD datasets demonstrate the effectiveness of our proposed approach.

KW - Image search

KW - Text queries

KW - Visual disambiguation

KW - Web images

UR - http://www.scopus.com/inward/record.url?scp=85090944429&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2020.107620

DO - 10.1016/j.patcog.2020.107620

M3 - Article

AN - SCOPUS:85090944429

SN - 0031-3203

VL - 110

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 107620

ER -

Exploiting textual queries for dynamically visual disambiguation

Abstract

Keywords

Access to Document

Other files and links

Cite this