TY - JOUR
T1 - Cycle-Free Weakly Referring Expression Grounding With Self-Paced Learning
AU - Sun, Mingjie
AU - Xiao, Jimin
AU - Lim, Eng Gee
AU - Zhao, Yao
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2023
Y1 - 2023
N2 - In this paper, we are tackling the weakly referring expression grounding task to localize the target object in an image according to a given query sentence, where the mapping between the query sentence and image regions is blind during the training period. Previous methods all follow a cyclic forward-backward pipeline to handle this task, where the query sentence is firstly converted to the result region through the forward module, and then the result region is converted back to a sentence through the backward module, with the difference between the reconstructed sentence and original query used as the loss to optimize the entire network. These existing methods, however, suffer from the deviation issue when the result region, generated through the forward module, totally deviates from the target area, but the backward module still reconstructs a similar sentence. The aforementioned loss function cannot penalize this kind of deviation because of the consistent prediction of the sentence. To overcome this limitation, we propose a cycle-free pipeline, where a region describer network is designed to predict the textual description for each candidate region, and a result region is selected according to the similarity between the predicted description and the query sentence. Furthermore, a self-paced learning mechanism is designed to avoid the drift issue during the warm-up period of the optimization process. The proposed method achieves a higher average accuracy on RefCOCO and RefCOCO+ datasets, compared with all previous state-of-the-art methods.
AB - In this paper, we are tackling the weakly referring expression grounding task to localize the target object in an image according to a given query sentence, where the mapping between the query sentence and image regions is blind during the training period. Previous methods all follow a cyclic forward-backward pipeline to handle this task, where the query sentence is firstly converted to the result region through the forward module, and then the result region is converted back to a sentence through the backward module, with the difference between the reconstructed sentence and original query used as the loss to optimize the entire network. These existing methods, however, suffer from the deviation issue when the result region, generated through the forward module, totally deviates from the target area, but the backward module still reconstructs a similar sentence. The aforementioned loss function cannot penalize this kind of deviation because of the consistent prediction of the sentence. To overcome this limitation, we propose a cycle-free pipeline, where a region describer network is designed to predict the textual description for each candidate region, and a result region is selected according to the similarity between the predicted description and the query sentence. Furthermore, a self-paced learning mechanism is designed to avoid the drift issue during the warm-up period of the optimization process. The proposed method achieves a higher average accuracy on RefCOCO and RefCOCO+ datasets, compared with all previous state-of-the-art methods.
KW - Referring expression grounding
KW - self-paced learning
KW - weakly supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85122583188&partnerID=8YFLogxK
U2 - 10.1109/TMM.2021.3139467
DO - 10.1109/TMM.2021.3139467
M3 - Article
AN - SCOPUS:85122583188
SN - 1520-9210
VL - 25
SP - 1611
EP - 1621
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -