TY - JOUR
T1 - HCPI-HRL
T2 - Human Causal Perception and Inference-driven Hierarchical Reinforcement Learning
AU - Chen, Bin
AU - Cao, Zehong
AU - Mayer, Wolfgang
AU - Stumptner, Markus
AU - Kowalczyk, Ryszard
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2025/7
Y1 - 2025/7
N2 - The dependency on extensive expert knowledge for defining subgoals in hierarchical reinforcement learning (HRL) restricts the training efficiency and adaptability of HRL agents in complex, dynamic environments. Inspired by human-guided causal discovery skills, we proposed a novel method, Human Causal Perception and Inference-driven Hierarchical Reinforcement Learning (HCPI-HRL), designed to infer diverse, effective subgoal structures as intrinsic rewards and incorporate critical objects from dynamic environmental states using stable causal relationships. The HCPI-HRL method is supposed to guide an agent's exploration direction and promote the reuse of learned subgoal structures across different tasks. Our designed HCPI-HRL comprises two levels: the top level operates as a meta controller, assigning subgoals discovered based on human-driven causal critical object perception and causal structure inference; the bottom level employs the Proximal Policy Optimisation (PPO) algorithm to accomplish the assigned subgoals. Experiments conducted across discrete and continuous control environments demonstrated that HCPI-HRL outperforms benchmark methods such as hierarchical and adjacency PPO in terms of training efficiency, exploration capability, and transferability. Our research extends the potential of HRL methods incorporating human-guided causal modelling to infer the effective relationships across subgoals, enhancing the agent's capability to learn efficient policies in dynamic environments with sparse reward signals.
AB - The dependency on extensive expert knowledge for defining subgoals in hierarchical reinforcement learning (HRL) restricts the training efficiency and adaptability of HRL agents in complex, dynamic environments. Inspired by human-guided causal discovery skills, we proposed a novel method, Human Causal Perception and Inference-driven Hierarchical Reinforcement Learning (HCPI-HRL), designed to infer diverse, effective subgoal structures as intrinsic rewards and incorporate critical objects from dynamic environmental states using stable causal relationships. The HCPI-HRL method is supposed to guide an agent's exploration direction and promote the reuse of learned subgoal structures across different tasks. Our designed HCPI-HRL comprises two levels: the top level operates as a meta controller, assigning subgoals discovered based on human-driven causal critical object perception and causal structure inference; the bottom level employs the Proximal Policy Optimisation (PPO) algorithm to accomplish the assigned subgoals. Experiments conducted across discrete and continuous control environments demonstrated that HCPI-HRL outperforms benchmark methods such as hierarchical and adjacency PPO in terms of training efficiency, exploration capability, and transferability. Our research extends the potential of HRL methods incorporating human-guided causal modelling to infer the effective relationships across subgoals, enhancing the agent's capability to learn efficient policies in dynamic environments with sparse reward signals.
KW - Causal inference
KW - Deep reinforcement learning
KW - Hierarchical reinforcement learning
KW - Subgoal discovery
UR - http://www.scopus.com/inward/record.url?scp=86000317510&partnerID=8YFLogxK
U2 - 10.1016/j.neunet.2025.107318
DO - 10.1016/j.neunet.2025.107318
M3 - Article
C2 - 40058179
AN - SCOPUS:86000317510
SN - 0893-6080
VL - 187
JO - Neural Networks
JF - Neural Networks
M1 - 107318
ER -