Abstract
Training an optimal policy in deep reinforcement learning (DRL) remains a significant challenge due to the pitfalls of inefficient sampling in dynamic environments with sparse rewards. In this paper, we proposed a Human Local Guide (HLG) incorporating high-level human knowledge and local policies to guide DRL agents to achieve optimal performance. HLG deployed the heuristic rules from human knowledge in differential decision trees and then injected them into neural networks, which can continuously improve the suboptimal global policy till the optimal level. Our developed HLG includes action guides based on a policy-switching mechanism and adaptive action guides inspired by an approximate policy evaluation scheme through a perturbation model to optimise policy further. Our proposed HLG outperforms PPO and PROLONET with at least 25% improvement in training efficiency and exploration capability based on MinGrid environments with sparse reward signals. This implies that HLG has a significant potential to continuously assist the DRL agent in achieving optimal policy in dynamic and complex environments.
Original language | English |
---|---|
Pages (from-to) | 2189-2191 |
Number of pages | 3 |
Journal | Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS |
Volume | 2024-May |
Publication status | Published - 2024 |
Externally published | Yes |
Event | 23rd International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2024 - Auckland, New Zealand Duration: 6 May 2024 → 10 May 2024 |
Keywords
- Deep Reinforcement Learning
- Differential Decision Trees
- Human Knowledge
- Local Guide
- Training Efficiency