Simple yet effective: An explicit query-based relation learner for human–object-interaction detection

Tianlun Luo, Qiao Yuan, Boxuan Zhu, Steven Guan*, Rui Yang, Jeremy S. Smith, Eng Gee Lim

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Human-object-interaction (HOI) detection is a task that requires the model to detect human-object pairs and recognize the interactions between them. Recent advances in Detection Transformers have had a great impact on the HOI detection research field. However, Transformer-based HOI detection methods only exploit randomized queries for decoding HOI triplets. For the HOI detection model using visual and other modalities of features as explicit queries, the computation complexity of the model can be significantly increased compared to randomized implicit queries. In this paper, a simplified approach for constructing explicit queries with only visual features of instances combined with a set prediction training strategy, used in Transformer-based HOI detection models, is proposed. This paper also proposes a novel method to extract and reduce the dimension of instance visual features, which can benefit the learning process of HOIs. The model proposed, in this paper, achieves state-of-the-art performance on the HICO-Det benchmark and shows a competitive training efficiency compared to other HOI detection methods.

Original languageEnglish
Article number130709
JournalNeurocomputing
Volume649
DOIs
Publication statusPublished - 7 Oct 2025

Keywords

  • Computer vision
  • Detection transformer
  • Human-object-interaction (HOI) detection

Cite this