HCPI-HRL: Human Causal Perception and Inference-driven Hierarchical Reinforcement Learning

Bin Chen, Zehong Cao*, Wolfgang Mayer, Markus Stumptner, Ryszard Kowalczyk

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The dependency on extensive expert knowledge for defining subgoals in hierarchical reinforcement learning (HRL) restricts the training efficiency and adaptability of HRL agents in complex, dynamic environments. Inspired by human-guided causal discovery skills, we proposed a novel method, Human Causal Perception and Inference-driven Hierarchical Reinforcement Learning (HCPI-HRL), designed to infer diverse, effective subgoal structures as intrinsic rewards and incorporate critical objects from dynamic environmental states using stable causal relationships. The HCPI-HRL method is supposed to guide an agent's exploration direction and promote the reuse of learned subgoal structures across different tasks. Our designed HCPI-HRL comprises two levels: the top level operates as a meta controller, assigning subgoals discovered based on human-driven causal critical object perception and causal structure inference; the bottom level employs the Proximal Policy Optimisation (PPO) algorithm to accomplish the assigned subgoals. Experiments conducted across discrete and continuous control environments demonstrated that HCPI-HRL outperforms benchmark methods such as hierarchical and adjacency PPO in terms of training efficiency, exploration capability, and transferability. Our research extends the potential of HRL methods incorporating human-guided causal modelling to infer the effective relationships across subgoals, enhancing the agent's capability to learn efficient policies in dynamic environments with sparse reward signals.

Original languageEnglish
Article number107318
JournalNeural Networks
Volume187
DOIs
Publication statusPublished - Jul 2025
Externally publishedYes

Keywords

  • Causal inference
  • Deep reinforcement learning
  • Hierarchical reinforcement learning
  • Subgoal discovery

Fingerprint

Dive into the research topics of 'HCPI-HRL: Human Causal Perception and Inference-driven Hierarchical Reinforcement Learning'. Together they form a unique fingerprint.

Cite this