Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding

Mingjie Sun; Jimin Xiao; Eng Gee Lim; Si Liu; John Y. Goulermas

doi:10.1109/TPAMI.2021.3058684

Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding

Mingjie Sun^*, Jimin Xiao, Eng Gee Lim, Si Liu, John Y. Goulermas

^*Corresponding author for this work

Department of Intelligent Science

Research output: Contribution to journal › Article › peer-review

51 Citations (Scopus)

Abstract

In this paper, we are tackling the weakly-supervised referring expression grounding task, for the localization of a referent object in an image according to a query sentence, where the mapping between image regions and queries are not available during the training stage. In traditional methods, an object region that best matches the referring expression is picked out, and then the query sentence is reconstructed from the selected region, where the reconstruction difference serves as the loss for back-propagation. The existing methods, however, conduct both the matching and the reconstruction approximately as they ignore the fact that the matching correctness is unknown. To overcome this limitation, a discriminative triad is designed here as the basis to the solution, through which a query can be converted into one or multiple discriminative triads in a very scalable way. Based on the discriminative triad, we further propose the triad-level matching and reconstruction modules which are lightweight yet effective for the weakly-supervised training, making it three times lighter and faster than the previous state-of-the-art methods. One important merit of our work is its superior performance despite the simple and neat design. Specifically, the proposed method achieves a new state-of-the-art accuracy when evaluated on RefCOCO (39.21 percent), RefCOCO+ (39.18 percent) and RefCOCOg (43.24 percent) datasets, that is 4.17, 4.08 and 7.8 percent higher than the previous one, respectively. The code is available at https://github.com/insomnia94/DTWREG.

Original language	English
Pages (from-to)	4189-4195
Number of pages	7
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	43
Issue number	11
DOIs	https://doi.org/10.1109/TPAMI.2021.3058684
Publication status	Published - 1 Nov 2021

Keywords

Referring expression grounding
discriminative triad matching
weakly supervised training

Access to Document

10.1109/TPAMI.2021.3058684

Cite this

@article{c38495b1f72f4120953c083a05ffd99c,

title = "Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding",

abstract = "In this paper, we are tackling the weakly-supervised referring expression grounding task, for the localization of a referent object in an image according to a query sentence, where the mapping between image regions and queries are not available during the training stage. In traditional methods, an object region that best matches the referring expression is picked out, and then the query sentence is reconstructed from the selected region, where the reconstruction difference serves as the loss for back-propagation. The existing methods, however, conduct both the matching and the reconstruction approximately as they ignore the fact that the matching correctness is unknown. To overcome this limitation, a discriminative triad is designed here as the basis to the solution, through which a query can be converted into one or multiple discriminative triads in a very scalable way. Based on the discriminative triad, we further propose the triad-level matching and reconstruction modules which are lightweight yet effective for the weakly-supervised training, making it three times lighter and faster than the previous state-of-the-art methods. One important merit of our work is its superior performance despite the simple and neat design. Specifically, the proposed method achieves a new state-of-the-art accuracy when evaluated on RefCOCO (39.21 percent), RefCOCO+ (39.18 percent) and RefCOCOg (43.24 percent) datasets, that is 4.17, 4.08 and 7.8 percent higher than the previous one, respectively. The code is available at https://github.com/insomnia94/DTWREG.",

keywords = "Referring expression grounding, discriminative triad matching, weakly supervised training",

author = "Mingjie Sun and Jimin Xiao and Lim, {Eng Gee} and Si Liu and Goulermas, {John Y.}",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2021",

month = nov,

day = "1",

doi = "10.1109/TPAMI.2021.3058684",

language = "English",

volume = "43",

pages = "4189--4195",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

number = "11",

}

TY - JOUR

T1 - Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding

AU - Sun, Mingjie

AU - Xiao, Jimin

AU - Lim, Eng Gee

AU - Liu, Si

AU - Goulermas, John Y.

PY - 2021/11/1

Y1 - 2021/11/1

N2 - In this paper, we are tackling the weakly-supervised referring expression grounding task, for the localization of a referent object in an image according to a query sentence, where the mapping between image regions and queries are not available during the training stage. In traditional methods, an object region that best matches the referring expression is picked out, and then the query sentence is reconstructed from the selected region, where the reconstruction difference serves as the loss for back-propagation. The existing methods, however, conduct both the matching and the reconstruction approximately as they ignore the fact that the matching correctness is unknown. To overcome this limitation, a discriminative triad is designed here as the basis to the solution, through which a query can be converted into one or multiple discriminative triads in a very scalable way. Based on the discriminative triad, we further propose the triad-level matching and reconstruction modules which are lightweight yet effective for the weakly-supervised training, making it three times lighter and faster than the previous state-of-the-art methods. One important merit of our work is its superior performance despite the simple and neat design. Specifically, the proposed method achieves a new state-of-the-art accuracy when evaluated on RefCOCO (39.21 percent), RefCOCO+ (39.18 percent) and RefCOCOg (43.24 percent) datasets, that is 4.17, 4.08 and 7.8 percent higher than the previous one, respectively. The code is available at https://github.com/insomnia94/DTWREG.

AB - In this paper, we are tackling the weakly-supervised referring expression grounding task, for the localization of a referent object in an image according to a query sentence, where the mapping between image regions and queries are not available during the training stage. In traditional methods, an object region that best matches the referring expression is picked out, and then the query sentence is reconstructed from the selected region, where the reconstruction difference serves as the loss for back-propagation. The existing methods, however, conduct both the matching and the reconstruction approximately as they ignore the fact that the matching correctness is unknown. To overcome this limitation, a discriminative triad is designed here as the basis to the solution, through which a query can be converted into one or multiple discriminative triads in a very scalable way. Based on the discriminative triad, we further propose the triad-level matching and reconstruction modules which are lightweight yet effective for the weakly-supervised training, making it three times lighter and faster than the previous state-of-the-art methods. One important merit of our work is its superior performance despite the simple and neat design. Specifically, the proposed method achieves a new state-of-the-art accuracy when evaluated on RefCOCO (39.21 percent), RefCOCO+ (39.18 percent) and RefCOCOg (43.24 percent) datasets, that is 4.17, 4.08 and 7.8 percent higher than the previous one, respectively. The code is available at https://github.com/insomnia94/DTWREG.

KW - Referring expression grounding

KW - discriminative triad matching

KW - weakly supervised training

UR - http://www.scopus.com/inward/record.url?scp=85100846675&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2021.3058684

DO - 10.1109/TPAMI.2021.3058684

M3 - Article

C2 - 33571088

AN - SCOPUS:85100846675

SN - 0162-8828

VL - 43

SP - 4189

EP - 4195

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 11

ER -

Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this