HCPI-HRL: Human Causal Perception and Inference-driven Hierarchical Reinforcement Learning

Bin Chen; Zehong Cao; Wolfgang Mayer; Markus Stumptner; Ryszard Kowalczyk

doi:10.1016/j.neunet.2025.107318

HCPI-HRL: Human Causal Perception and Inference-driven Hierarchical Reinforcement Learning

Bin Chen, Zehong Cao^*, Wolfgang Mayer, Markus Stumptner, Ryszard Kowalczyk

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

The dependency on extensive expert knowledge for defining subgoals in hierarchical reinforcement learning (HRL) restricts the training efficiency and adaptability of HRL agents in complex, dynamic environments. Inspired by human-guided causal discovery skills, we proposed a novel method, Human Causal Perception and Inference-driven Hierarchical Reinforcement Learning (HCPI-HRL), designed to infer diverse, effective subgoal structures as intrinsic rewards and incorporate critical objects from dynamic environmental states using stable causal relationships. The HCPI-HRL method is supposed to guide an agent's exploration direction and promote the reuse of learned subgoal structures across different tasks. Our designed HCPI-HRL comprises two levels: the top level operates as a meta controller, assigning subgoals discovered based on human-driven causal critical object perception and causal structure inference; the bottom level employs the Proximal Policy Optimisation (PPO) algorithm to accomplish the assigned subgoals. Experiments conducted across discrete and continuous control environments demonstrated that HCPI-HRL outperforms benchmark methods such as hierarchical and adjacency PPO in terms of training efficiency, exploration capability, and transferability. Our research extends the potential of HRL methods incorporating human-guided causal modelling to infer the effective relationships across subgoals, enhancing the agent's capability to learn efficient policies in dynamic environments with sparse reward signals.

Original language	English
Article number	107318
Journal	Neural Networks
Volume	187
DOIs	https://doi.org/10.1016/j.neunet.2025.107318
Publication status	Published - Jul 2025
Externally published	Yes

Keywords

Causal inference
Deep reinforcement learning
Hierarchical reinforcement learning
Subgoal discovery

Access to Document

10.1016/j.neunet.2025.107318

Cite this

@article{9e6b9af87e884acaa93b81b4d60f4782,

title = "HCPI-HRL: Human Causal Perception and Inference-driven Hierarchical Reinforcement Learning",

abstract = "The dependency on extensive expert knowledge for defining subgoals in hierarchical reinforcement learning (HRL) restricts the training efficiency and adaptability of HRL agents in complex, dynamic environments. Inspired by human-guided causal discovery skills, we proposed a novel method, Human Causal Perception and Inference-driven Hierarchical Reinforcement Learning (HCPI-HRL), designed to infer diverse, effective subgoal structures as intrinsic rewards and incorporate critical objects from dynamic environmental states using stable causal relationships. The HCPI-HRL method is supposed to guide an agent's exploration direction and promote the reuse of learned subgoal structures across different tasks. Our designed HCPI-HRL comprises two levels: the top level operates as a meta controller, assigning subgoals discovered based on human-driven causal critical object perception and causal structure inference; the bottom level employs the Proximal Policy Optimisation (PPO) algorithm to accomplish the assigned subgoals. Experiments conducted across discrete and continuous control environments demonstrated that HCPI-HRL outperforms benchmark methods such as hierarchical and adjacency PPO in terms of training efficiency, exploration capability, and transferability. Our research extends the potential of HRL methods incorporating human-guided causal modelling to infer the effective relationships across subgoals, enhancing the agent's capability to learn efficient policies in dynamic environments with sparse reward signals.",

keywords = "Causal inference, Deep reinforcement learning, Hierarchical reinforcement learning, Subgoal discovery",

author = "Bin Chen and Zehong Cao and Wolfgang Mayer and Markus Stumptner and Ryszard Kowalczyk",

note = "Publisher Copyright: {\textcopyright} 2025 The Authors",

year = "2025",

month = jul,

doi = "10.1016/j.neunet.2025.107318",

language = "English",

volume = "187",

journal = "Neural Networks",

issn = "0893-6080",

}

TY - JOUR

T1 - HCPI-HRL

T2 - Human Causal Perception and Inference-driven Hierarchical Reinforcement Learning

AU - Chen, Bin

AU - Cao, Zehong

AU - Mayer, Wolfgang

AU - Stumptner, Markus

AU - Kowalczyk, Ryszard

PY - 2025/7

Y1 - 2025/7

N2 - The dependency on extensive expert knowledge for defining subgoals in hierarchical reinforcement learning (HRL) restricts the training efficiency and adaptability of HRL agents in complex, dynamic environments. Inspired by human-guided causal discovery skills, we proposed a novel method, Human Causal Perception and Inference-driven Hierarchical Reinforcement Learning (HCPI-HRL), designed to infer diverse, effective subgoal structures as intrinsic rewards and incorporate critical objects from dynamic environmental states using stable causal relationships. The HCPI-HRL method is supposed to guide an agent's exploration direction and promote the reuse of learned subgoal structures across different tasks. Our designed HCPI-HRL comprises two levels: the top level operates as a meta controller, assigning subgoals discovered based on human-driven causal critical object perception and causal structure inference; the bottom level employs the Proximal Policy Optimisation (PPO) algorithm to accomplish the assigned subgoals. Experiments conducted across discrete and continuous control environments demonstrated that HCPI-HRL outperforms benchmark methods such as hierarchical and adjacency PPO in terms of training efficiency, exploration capability, and transferability. Our research extends the potential of HRL methods incorporating human-guided causal modelling to infer the effective relationships across subgoals, enhancing the agent's capability to learn efficient policies in dynamic environments with sparse reward signals.

AB - The dependency on extensive expert knowledge for defining subgoals in hierarchical reinforcement learning (HRL) restricts the training efficiency and adaptability of HRL agents in complex, dynamic environments. Inspired by human-guided causal discovery skills, we proposed a novel method, Human Causal Perception and Inference-driven Hierarchical Reinforcement Learning (HCPI-HRL), designed to infer diverse, effective subgoal structures as intrinsic rewards and incorporate critical objects from dynamic environmental states using stable causal relationships. The HCPI-HRL method is supposed to guide an agent's exploration direction and promote the reuse of learned subgoal structures across different tasks. Our designed HCPI-HRL comprises two levels: the top level operates as a meta controller, assigning subgoals discovered based on human-driven causal critical object perception and causal structure inference; the bottom level employs the Proximal Policy Optimisation (PPO) algorithm to accomplish the assigned subgoals. Experiments conducted across discrete and continuous control environments demonstrated that HCPI-HRL outperforms benchmark methods such as hierarchical and adjacency PPO in terms of training efficiency, exploration capability, and transferability. Our research extends the potential of HRL methods incorporating human-guided causal modelling to infer the effective relationships across subgoals, enhancing the agent's capability to learn efficient policies in dynamic environments with sparse reward signals.

KW - Causal inference

KW - Deep reinforcement learning

KW - Hierarchical reinforcement learning

KW - Subgoal discovery

UR - http://www.scopus.com/inward/record.url?scp=86000317510&partnerID=8YFLogxK

U2 - 10.1016/j.neunet.2025.107318

DO - 10.1016/j.neunet.2025.107318

M3 - Article

C2 - 40058179

AN - SCOPUS:86000317510

SN - 0893-6080

VL - 187

JO - Neural Networks

JF - Neural Networks

M1 - 107318

ER -

HCPI-HRL: Human Causal Perception and Inference-driven Hierarchical Reinforcement Learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this