TY - JOUR
T1 - Simple yet effective
T2 - An explicit query-based relation learner for human–object-interaction detection
AU - Luo, Tianlun
AU - Yuan, Qiao
AU - Zhu, Boxuan
AU - Guan, Steven
AU - Yang, Rui
AU - Smith, Jeremy S.
AU - Gee Lim, Eng
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2025/10/7
Y1 - 2025/10/7
N2 - Human-object-interaction (HOI) detection is a task that requires the model to detect human-object pairs and recognize the interactions between them. Recent advances in Detection Transformers have had a great impact on the HOI detection research field. However, Transformer-based HOI detection methods only exploit randomized queries for decoding HOI triplets. For the HOI detection model using visual and other modalities of features as explicit queries, the computation complexity of the model can be significantly increased compared to randomized implicit queries. In this paper, a simplified approach for constructing explicit queries with only visual features of instances combined with a set prediction training strategy, used in Transformer-based HOI detection models, is proposed. This paper also proposes a novel method to extract and reduce the dimension of instance visual features, which can benefit the learning process of HOIs. The model proposed, in this paper, achieves state-of-the-art performance on the HICO-Det benchmark and shows a competitive training efficiency compared to other HOI detection methods.
AB - Human-object-interaction (HOI) detection is a task that requires the model to detect human-object pairs and recognize the interactions between them. Recent advances in Detection Transformers have had a great impact on the HOI detection research field. However, Transformer-based HOI detection methods only exploit randomized queries for decoding HOI triplets. For the HOI detection model using visual and other modalities of features as explicit queries, the computation complexity of the model can be significantly increased compared to randomized implicit queries. In this paper, a simplified approach for constructing explicit queries with only visual features of instances combined with a set prediction training strategy, used in Transformer-based HOI detection models, is proposed. This paper also proposes a novel method to extract and reduce the dimension of instance visual features, which can benefit the learning process of HOIs. The model proposed, in this paper, achieves state-of-the-art performance on the HICO-Det benchmark and shows a competitive training efficiency compared to other HOI detection methods.
KW - Computer vision
KW - Detection transformer
KW - Human-object-interaction (HOI) detection
UR - http://www.scopus.com/inward/record.url?scp=105008891921&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2025.130709
DO - 10.1016/j.neucom.2025.130709
M3 - Article
AN - SCOPUS:105008891921
SN - 0925-2312
VL - 649
JO - Neurocomputing
JF - Neurocomputing
M1 - 130709
ER -