Simple yet effective: An explicit query-based relation learner for human–object-interaction detection

Tianlun Luo; Qiao Yuan; Boxuan Zhu; Steven Guan; Rui Yang; Jeremy S. Smith; Eng Gee Lim

doi:10.1016/j.neucom.2025.130709

Simple yet effective: An explicit query-based relation learner for human–object-interaction detection

Tianlun Luo, Qiao Yuan, Boxuan Zhu, Steven Guan^*, Rui Yang, Jeremy S. Smith, Eng Gee Lim

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

Human-object-interaction (HOI) detection is a task that requires the model to detect human-object pairs and recognize the interactions between them. Recent advances in Detection Transformers have had a great impact on the HOI detection research field. However, Transformer-based HOI detection methods only exploit randomized queries for decoding HOI triplets. For the HOI detection model using visual and other modalities of features as explicit queries, the computation complexity of the model can be significantly increased compared to randomized implicit queries. In this paper, a simplified approach for constructing explicit queries with only visual features of instances combined with a set prediction training strategy, used in Transformer-based HOI detection models, is proposed. This paper also proposes a novel method to extract and reduce the dimension of instance visual features, which can benefit the learning process of HOIs. The model proposed, in this paper, achieves state-of-the-art performance on the HICO-Det benchmark and shows a competitive training efficiency compared to other HOI detection methods.

Original language	English
Article number	130709
Journal	Neurocomputing
Volume	649
DOIs	https://doi.org/10.1016/j.neucom.2025.130709
Publication status	Published - 7 Oct 2025

Keywords

Computer vision
Detection transformer
Human-object-interaction (HOI) detection

Access to Document

10.1016/j.neucom.2025.130709

Cite this

@article{1418ac5cb043442f8445c9e1a59555a7,

title = "Simple yet effective: An explicit query-based relation learner for human–object-interaction detection",

abstract = "Human-object-interaction (HOI) detection is a task that requires the model to detect human-object pairs and recognize the interactions between them. Recent advances in Detection Transformers have had a great impact on the HOI detection research field. However, Transformer-based HOI detection methods only exploit randomized queries for decoding HOI triplets. For the HOI detection model using visual and other modalities of features as explicit queries, the computation complexity of the model can be significantly increased compared to randomized implicit queries. In this paper, a simplified approach for constructing explicit queries with only visual features of instances combined with a set prediction training strategy, used in Transformer-based HOI detection models, is proposed. This paper also proposes a novel method to extract and reduce the dimension of instance visual features, which can benefit the learning process of HOIs. The model proposed, in this paper, achieves state-of-the-art performance on the HICO-Det benchmark and shows a competitive training efficiency compared to other HOI detection methods.",

keywords = "Computer vision, Detection transformer, Human-object-interaction (HOI) detection",

author = "Tianlun Luo and Qiao Yuan and Boxuan Zhu and Steven Guan and Rui Yang and Smith, {Jeremy S.} and {Gee Lim}, Eng",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier B.V.",

year = "2025",

month = oct,

day = "7",

doi = "10.1016/j.neucom.2025.130709",

language = "English",

volume = "649",

journal = "Neurocomputing",

issn = "0925-2312",

}

TY - JOUR

T1 - Simple yet effective

T2 - An explicit query-based relation learner for human–object-interaction detection

AU - Luo, Tianlun

AU - Yuan, Qiao

AU - Zhu, Boxuan

AU - Guan, Steven

AU - Yang, Rui

AU - Smith, Jeremy S.

AU - Gee Lim, Eng

PY - 2025/10/7

Y1 - 2025/10/7

N2 - Human-object-interaction (HOI) detection is a task that requires the model to detect human-object pairs and recognize the interactions between them. Recent advances in Detection Transformers have had a great impact on the HOI detection research field. However, Transformer-based HOI detection methods only exploit randomized queries for decoding HOI triplets. For the HOI detection model using visual and other modalities of features as explicit queries, the computation complexity of the model can be significantly increased compared to randomized implicit queries. In this paper, a simplified approach for constructing explicit queries with only visual features of instances combined with a set prediction training strategy, used in Transformer-based HOI detection models, is proposed. This paper also proposes a novel method to extract and reduce the dimension of instance visual features, which can benefit the learning process of HOIs. The model proposed, in this paper, achieves state-of-the-art performance on the HICO-Det benchmark and shows a competitive training efficiency compared to other HOI detection methods.

AB - Human-object-interaction (HOI) detection is a task that requires the model to detect human-object pairs and recognize the interactions between them. Recent advances in Detection Transformers have had a great impact on the HOI detection research field. However, Transformer-based HOI detection methods only exploit randomized queries for decoding HOI triplets. For the HOI detection model using visual and other modalities of features as explicit queries, the computation complexity of the model can be significantly increased compared to randomized implicit queries. In this paper, a simplified approach for constructing explicit queries with only visual features of instances combined with a set prediction training strategy, used in Transformer-based HOI detection models, is proposed. This paper also proposes a novel method to extract and reduce the dimension of instance visual features, which can benefit the learning process of HOIs. The model proposed, in this paper, achieves state-of-the-art performance on the HICO-Det benchmark and shows a competitive training efficiency compared to other HOI detection methods.

KW - Computer vision

KW - Detection transformer

KW - Human-object-interaction (HOI) detection

UR - http://www.scopus.com/inward/record.url?scp=105008891921&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2025.130709

DO - 10.1016/j.neucom.2025.130709

M3 - Article

AN - SCOPUS:105008891921

SN - 0925-2312

VL - 649

JO - Neurocomputing

JF - Neurocomputing

M1 - 130709

ER -

Simple yet effective: An explicit query-based relation learner for human–object-interaction detection

Abstract

Keywords

Access to Document

Other files and links

Cite this