HLG: Bridging Human Heuristic Knowledge and Deep Reinforcement Learning for Optimal Agent Performance

Bin Chen, Zehong Cao

Research output: Contribution to journalConference articlepeer-review

1 Citation (Scopus)

Abstract

Training an optimal policy in deep reinforcement learning (DRL) remains a significant challenge due to the pitfalls of inefficient sampling in dynamic environments with sparse rewards. In this paper, we proposed a Human Local Guide (HLG) incorporating high-level human knowledge and local policies to guide DRL agents to achieve optimal performance. HLG deployed the heuristic rules from human knowledge in differential decision trees and then injected them into neural networks, which can continuously improve the suboptimal global policy till the optimal level. Our developed HLG includes action guides based on a policy-switching mechanism and adaptive action guides inspired by an approximate policy evaluation scheme through a perturbation model to optimise policy further. Our proposed HLG outperforms PPO and PROLONET with at least 25% improvement in training efficiency and exploration capability based on MinGrid environments with sparse reward signals. This implies that HLG has a significant potential to continuously assist the DRL agent in achieving optimal policy in dynamic and complex environments.

Original languageEnglish
Pages (from-to)2189-2191
Number of pages3
JournalProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume2024-May
Publication statusPublished - 2024
Externally publishedYes
Event23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2024 - Auckland, New Zealand
Duration: 6 May 202410 May 2024

Keywords

  • Deep Reinforcement Learning
  • Differential Decision Trees
  • Human Knowledge
  • Local Guide
  • Training Efficiency

Fingerprint

Dive into the research topics of 'HLG: Bridging Human Heuristic Knowledge and Deep Reinforcement Learning for Optimal Agent Performance'. Together they form a unique fingerprint.

Cite this