TY - GEN
T1 - GP-PAIL
T2 - 7th IEEE International Conference on Big Data and Artificial Intelligence, BDAI 2024
AU - Li, Yulong
AU - Wang, Boqian
AU - Su, Jionglong
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Traditional multi-agent reinforcement learning algorithms are unsuitable for massive-agent environments where the problems of credit allocation, dense reward function design and stage course design become pronounced. Since massive-agent environments can only give sparse rewards to the agents, these algorithms face difficulty in learning effective actions. While specially designed dense reward functions can help the agents to obtain more reward signals, the algorithms face a trade-off between convergence speed and generalization ability. Although a hand-crafted reward function can provide frequent feedback that accelerates the learning of the agents, excessively detailed rewards may cause the agents to focus on short-term rewards and overlook long-term goals, resulting in a sub-optimal strategy. To address this, we propose GP-PAIL (Generative Pixel-to-Pixel Adversarial Imitation Learning), a novel generative adversarial imitation learning algorithm that uses a pixel-to-pixel policy structure for centralized control. It mitigates the issues of fixed behavioral patterns and credit allocation inherent in the specially designed of dense reward functions and staged curricula, enhancing imitation learning in massive-agent environments. Experimental results demonstrate the efficacy of GP-PAIL, with a 92% win rate compared to the best12 algorithm. Furthermore, in terms of early skill learning speed, it improves nearly 3.25 times faster on number of episodes compared to the current state-of-the-art best32 algorithm.
AB - Traditional multi-agent reinforcement learning algorithms are unsuitable for massive-agent environments where the problems of credit allocation, dense reward function design and stage course design become pronounced. Since massive-agent environments can only give sparse rewards to the agents, these algorithms face difficulty in learning effective actions. While specially designed dense reward functions can help the agents to obtain more reward signals, the algorithms face a trade-off between convergence speed and generalization ability. Although a hand-crafted reward function can provide frequent feedback that accelerates the learning of the agents, excessively detailed rewards may cause the agents to focus on short-term rewards and overlook long-term goals, resulting in a sub-optimal strategy. To address this, we propose GP-PAIL (Generative Pixel-to-Pixel Adversarial Imitation Learning), a novel generative adversarial imitation learning algorithm that uses a pixel-to-pixel policy structure for centralized control. It mitigates the issues of fixed behavioral patterns and credit allocation inherent in the specially designed of dense reward functions and staged curricula, enhancing imitation learning in massive-agent environments. Experimental results demonstrate the efficacy of GP-PAIL, with a 92% win rate compared to the best12 algorithm. Furthermore, in terms of early skill learning speed, it improves nearly 3.25 times faster on number of episodes compared to the current state-of-the-art best32 algorithm.
KW - Behavioral Cloning
KW - Generative Adversarial Imitation Learning
KW - LUX
KW - Massive-Agent Reinforcement Learning Environments
KW - Pixel-To-Pixel Policy Architecture
UR - http://www.scopus.com/inward/record.url?scp=85206925771&partnerID=8YFLogxK
U2 - 10.1109/BDAI62182.2024.10692671
DO - 10.1109/BDAI62182.2024.10692671
M3 - Conference Proceeding
AN - SCOPUS:85206925771
T3 - 2024 IEEE 7th International Conference on Big Data and Artificial Intelligence, BDAI 2024
SP - 314
EP - 322
BT - 2024 IEEE 7th International Conference on Big Data and Artificial Intelligence, BDAI 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 5 July 2024 through 7 July 2024
ER -