HLG: Bridging Human Heuristic Knowledge and Deep Reinforcement Learning for Optimal Agent Performance

Bin Chen; Zehong Cao

HLG: Bridging Human Heuristic Knowledge and Deep Reinforcement Learning for Optimal Agent Performance

Bin Chen, Zehong Cao

University of South Australia

Research output: Contribution to journal › Conference article › peer-review

1 Citation (Scopus)

Abstract

Training an optimal policy in deep reinforcement learning (DRL) remains a significant challenge due to the pitfalls of inefficient sampling in dynamic environments with sparse rewards. In this paper, we proposed a Human Local Guide (HLG) incorporating high-level human knowledge and local policies to guide DRL agents to achieve optimal performance. HLG deployed the heuristic rules from human knowledge in differential decision trees and then injected them into neural networks, which can continuously improve the suboptimal global policy till the optimal level. Our developed HLG includes action guides based on a policy-switching mechanism and adaptive action guides inspired by an approximate policy evaluation scheme through a perturbation model to optimise policy further. Our proposed HLG outperforms PPO and PROLONET with at least 25% improvement in training efficiency and exploration capability based on MinGrid environments with sparse reward signals. This implies that HLG has a significant potential to continuously assist the DRL agent in achieving optimal policy in dynamic and complex environments.

Original language	English
Pages (from-to)	2189-2191
Number of pages	3
Journal	Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume	2024-May
Publication status	Published - 2024
Externally published	Yes
Event	23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2024 - Auckland, New Zealand Duration: 6 May 2024 → 10 May 2024

Keywords

Deep Reinforcement Learning
Differential Decision Trees
Human Knowledge
Local Guide
Training Efficiency

Cite this

@article{8a844190ab1f461c8a408dc4809e76e6,

title = "HLG: Bridging Human Heuristic Knowledge and Deep Reinforcement Learning for Optimal Agent Performance",

abstract = "Training an optimal policy in deep reinforcement learning (DRL) remains a significant challenge due to the pitfalls of inefficient sampling in dynamic environments with sparse rewards. In this paper, we proposed a Human Local Guide (HLG) incorporating high-level human knowledge and local policies to guide DRL agents to achieve optimal performance. HLG deployed the heuristic rules from human knowledge in differential decision trees and then injected them into neural networks, which can continuously improve the suboptimal global policy till the optimal level. Our developed HLG includes action guides based on a policy-switching mechanism and adaptive action guides inspired by an approximate policy evaluation scheme through a perturbation model to optimise policy further. Our proposed HLG outperforms PPO and PROLONET with at least 25% improvement in training efficiency and exploration capability based on MinGrid environments with sparse reward signals. This implies that HLG has a significant potential to continuously assist the DRL agent in achieving optimal policy in dynamic and complex environments.",

keywords = "Deep Reinforcement Learning, Differential Decision Trees, Human Knowledge, Local Guide, Training Efficiency",

author = "Bin Chen and Zehong Cao",

note = "Publisher Copyright: {\textcopyright} 2024 International Foundation for Autonomous Agents and Multiagent Systems.; 23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2024 ; Conference date: 06-05-2024 Through 10-05-2024",

year = "2024",

language = "English",

volume = "2024-May",

pages = "2189--2191",

journal = "Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS",

issn = "1548-8403",

}

TY - JOUR

T1 - HLG

T2 - 23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2024

AU - Chen, Bin

AU - Cao, Zehong

PY - 2024

Y1 - 2024

N2 - Training an optimal policy in deep reinforcement learning (DRL) remains a significant challenge due to the pitfalls of inefficient sampling in dynamic environments with sparse rewards. In this paper, we proposed a Human Local Guide (HLG) incorporating high-level human knowledge and local policies to guide DRL agents to achieve optimal performance. HLG deployed the heuristic rules from human knowledge in differential decision trees and then injected them into neural networks, which can continuously improve the suboptimal global policy till the optimal level. Our developed HLG includes action guides based on a policy-switching mechanism and adaptive action guides inspired by an approximate policy evaluation scheme through a perturbation model to optimise policy further. Our proposed HLG outperforms PPO and PROLONET with at least 25% improvement in training efficiency and exploration capability based on MinGrid environments with sparse reward signals. This implies that HLG has a significant potential to continuously assist the DRL agent in achieving optimal policy in dynamic and complex environments.

AB - Training an optimal policy in deep reinforcement learning (DRL) remains a significant challenge due to the pitfalls of inefficient sampling in dynamic environments with sparse rewards. In this paper, we proposed a Human Local Guide (HLG) incorporating high-level human knowledge and local policies to guide DRL agents to achieve optimal performance. HLG deployed the heuristic rules from human knowledge in differential decision trees and then injected them into neural networks, which can continuously improve the suboptimal global policy till the optimal level. Our developed HLG includes action guides based on a policy-switching mechanism and adaptive action guides inspired by an approximate policy evaluation scheme through a perturbation model to optimise policy further. Our proposed HLG outperforms PPO and PROLONET with at least 25% improvement in training efficiency and exploration capability based on MinGrid environments with sparse reward signals. This implies that HLG has a significant potential to continuously assist the DRL agent in achieving optimal policy in dynamic and complex environments.

KW - Deep Reinforcement Learning

KW - Differential Decision Trees

KW - Human Knowledge

KW - Local Guide

KW - Training Efficiency

UR - http://www.scopus.com/inward/record.url?scp=85196373954&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85196373954

SN - 1548-8403

VL - 2024-May

SP - 2189

EP - 2191

JO - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS

JF - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS

Y2 - 6 May 2024 through 10 May 2024

ER -

HLG: Bridging Human Heuristic Knowledge and Deep Reinforcement Learning for Optimal Agent Performance

Abstract

Keywords

Other files and links

Fingerprint

Cite this