Common Sense Language-Guided Exploration and Hierarchical Dense Perception for Instruction Following Embodied Agents

Yuanwen Chen, Xinyao Zhang, Yaran Chen*, Dongbin Zhao, Yunzhen Zhao, Zhe Zhao, Pengfei Hu

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

1 Citation (Scopus)

Abstract

Embodied Instruction Following (EIF) involves the task of locating and manipulating objects according to language instructions. Existing methods face challenges in small object navigation due to ineffective exploration and imperfect perception, which ultimately affects their performance. This study focuses on small object navigation in the EIF domain. We propose Common Sense Language-guided exploration (CSL), a novel approach that leverages common-sense knowledge from seen scenes and information from language instructions to infer the location of objects. The proposed CSL significantly improves exploration efficiency. Additionally, we propose Hierarchical Dense Perception (HDP), which uses hierarchical features to perform semantic segmentation and depth estimation. The use of HDP significantly improves the agent's perceptual capabilities. Experiments on the ALFRED benchmark demonstrate the effectiveness of CSL-HDP. The proposed CSL-HDP achieves an absolute improvement of 9.29% (18.45% relative) on unseen test scenes compared to the previous state-of-the-art, securing the top position on the leaderboard. Code will be available at https://github.com/Cyuanwen/CSL-HDP.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Multimedia and Expo, ICME 2024
PublisherIEEE Computer Society
ISBN (Electronic)9798350390155
DOIs
Publication statusPublished - 2024
Event2024 IEEE International Conference on Multimedia and Expo, ICME 2024 - Niagra Falls, Canada
Duration: 15 Jul 202419 Jul 2024

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2024 IEEE International Conference on Multimedia and Expo, ICME 2024
Country/TerritoryCanada
CityNiagra Falls
Period15/07/2419/07/24

Keywords

  • Computer Vision
  • Embodied Instruction Following
  • Object Navigation

Fingerprint

Dive into the research topics of 'Common Sense Language-Guided Exploration and Hierarchical Dense Perception for Instruction Following Embodied Agents'. Together they form a unique fingerprint.

Cite this