Cycle-Free Weakly Referring Expression Grounding With Self-Paced Learning

Mingjie Sun, Jimin Xiao*, Eng Gee Lim, Yao Zhao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)


In this paper, we are tackling the weakly referring expression grounding task to localize the target object in an image according to a given query sentence, where the mapping between the query sentence and image regions is blind during the training period. Previous methods all follow a cyclic forward-backward pipeline to handle this task, where the query sentence is firstly converted to the result region through the forward module, and then the result region is converted back to a sentence through the backward module, with the difference between the reconstructed sentence and original query used as the loss to optimize the entire network. These existing methods, however, suffer from the deviation issue when the result region, generated through the forward module, totally deviates from the target area, but the backward module still reconstructs a similar sentence. The aforementioned loss function cannot penalize this kind of deviation because of the consistent prediction of the sentence. To overcome this limitation, we propose a cycle-free pipeline, where a region describer network is designed to predict the textual description for each candidate region, and a result region is selected according to the similarity between the predicted description and the query sentence. Furthermore, a self-paced learning mechanism is designed to avoid the drift issue during the warm-up period of the optimization process. The proposed method achieves a higher average accuracy on RefCOCO and RefCOCO+ datasets, compared with all previous state-of-the-art methods.

Original languageEnglish
Pages (from-to)1611-1621
Number of pages11
JournalIEEE Transactions on Multimedia
Publication statusPublished - 2023


  • Referring expression grounding
  • self-paced learning
  • weakly supervised learning


Dive into the research topics of 'Cycle-Free Weakly Referring Expression Grounding With Self-Paced Learning'. Together they form a unique fingerprint.

Cite this