Decentralized multi-agent cooperation via adaptive partner modeling

Chenhang Xu; Jia Wang; Xiaohui Zhu; Yong Yue; Weifeng Zhou; Zhixuan Liang; Dominik Wojtczak

doi:10.1007/s40747-024-01421-3

Decentralized multi-agent cooperation via adaptive partner modeling

Chenhang Xu, Jia Wang, Xiaohui Zhu^*, Yong Yue, Weifeng Zhou, Zhixuan Liang, Dominik Wojtczak

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

Multi-agent reinforcement learning encounters a non-stationary challenge, where agents concurrently update their policies, leading to changes in the environment. Existing approaches have tackled this challenge through communication among agents to obtain their partners’ actions, but this introduces computational complexity known as partner sample complexity. An alternative approach is to develop partner models that generate samples instead of direct communication to mitigate this complexity. However, a discrepancy arises between the real policies distribution and the policy of partner models, termed as model bias, which can significantly impact performance when heavily relying on partner models. In order to achieve a trade-off between sample complexity and performance, a novel multi-agent model-based reinforcement learning algorithm called decentralized adaptive partner modeling (DAPM) is proposed, which utilizes fictitious self play (FSP) to construct partner models and update policies. Model bias is addressed by establishing an upper bound to restrict the usage of partner models. Coupled with that, an adaptive rollout approach is introduced, enabling real agents to dynamically communicate with partner models based on their quality, ensuring that agent performance can progressively improve with partner model samples. The effectiveness of DAPM is exhibited in two multi-agent tasks, showing that DAPM outperforms existing model-free algorithms in terms of partner sample complexity and training stability. Specifically, DAPM requires 28.5% fewer communications compared to the best baseline and exhibits reduced fluctuations in the learning curve, indicating superior performance.

Original language	English
Pages (from-to)	4989-5004
Number of pages	16
Journal	Complex and Intelligent Systems
Volume	10
Issue number	4
DOIs	https://doi.org/10.1007/s40747-024-01421-3
Publication status	Accepted/In press - 2024

Keywords

Fictitious self play
Multi-agent reinforcement learning
Partner modeling
Partner sample complexity

Access to Document

10.1007/s40747-024-01421-3

Cite this

@article{8993039d2b234bec9cead65c5f2ca9e7,

title = "Decentralized multi-agent cooperation via adaptive partner modeling",

abstract = "Multi-agent reinforcement learning encounters a non-stationary challenge, where agents concurrently update their policies, leading to changes in the environment. Existing approaches have tackled this challenge through communication among agents to obtain their partners{\textquoteright} actions, but this introduces computational complexity known as partner sample complexity. An alternative approach is to develop partner models that generate samples instead of direct communication to mitigate this complexity. However, a discrepancy arises between the real policies distribution and the policy of partner models, termed as model bias, which can significantly impact performance when heavily relying on partner models. In order to achieve a trade-off between sample complexity and performance, a novel multi-agent model-based reinforcement learning algorithm called decentralized adaptive partner modeling (DAPM) is proposed, which utilizes fictitious self play (FSP) to construct partner models and update policies. Model bias is addressed by establishing an upper bound to restrict the usage of partner models. Coupled with that, an adaptive rollout approach is introduced, enabling real agents to dynamically communicate with partner models based on their quality, ensuring that agent performance can progressively improve with partner model samples. The effectiveness of DAPM is exhibited in two multi-agent tasks, showing that DAPM outperforms existing model-free algorithms in terms of partner sample complexity and training stability. Specifically, DAPM requires 28.5% fewer communications compared to the best baseline and exhibits reduced fluctuations in the learning curve, indicating superior performance.",

keywords = "Fictitious self play, Multi-agent reinforcement learning, Partner modeling, Partner sample complexity",

author = "Chenhang Xu and Jia Wang and Xiaohui Zhu and Yong Yue and Weifeng Zhou and Zhixuan Liang and Dominik Wojtczak",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",

year = "2024",

doi = "10.1007/s40747-024-01421-3",

language = "English",

volume = "10",

pages = "4989--5004",

journal = "Complex and Intelligent Systems",

issn = "2199-4536",

number = "4",

}

TY - JOUR

T1 - Decentralized multi-agent cooperation via adaptive partner modeling

AU - Xu, Chenhang

AU - Wang, Jia

AU - Zhu, Xiaohui

AU - Yue, Yong

AU - Zhou, Weifeng

AU - Liang, Zhixuan

AU - Wojtczak, Dominik

N1 - Publisher Copyright: © The Author(s) 2024.

PY - 2024

Y1 - 2024

N2 - Multi-agent reinforcement learning encounters a non-stationary challenge, where agents concurrently update their policies, leading to changes in the environment. Existing approaches have tackled this challenge through communication among agents to obtain their partners’ actions, but this introduces computational complexity known as partner sample complexity. An alternative approach is to develop partner models that generate samples instead of direct communication to mitigate this complexity. However, a discrepancy arises between the real policies distribution and the policy of partner models, termed as model bias, which can significantly impact performance when heavily relying on partner models. In order to achieve a trade-off between sample complexity and performance, a novel multi-agent model-based reinforcement learning algorithm called decentralized adaptive partner modeling (DAPM) is proposed, which utilizes fictitious self play (FSP) to construct partner models and update policies. Model bias is addressed by establishing an upper bound to restrict the usage of partner models. Coupled with that, an adaptive rollout approach is introduced, enabling real agents to dynamically communicate with partner models based on their quality, ensuring that agent performance can progressively improve with partner model samples. The effectiveness of DAPM is exhibited in two multi-agent tasks, showing that DAPM outperforms existing model-free algorithms in terms of partner sample complexity and training stability. Specifically, DAPM requires 28.5% fewer communications compared to the best baseline and exhibits reduced fluctuations in the learning curve, indicating superior performance.

AB - Multi-agent reinforcement learning encounters a non-stationary challenge, where agents concurrently update their policies, leading to changes in the environment. Existing approaches have tackled this challenge through communication among agents to obtain their partners’ actions, but this introduces computational complexity known as partner sample complexity. An alternative approach is to develop partner models that generate samples instead of direct communication to mitigate this complexity. However, a discrepancy arises between the real policies distribution and the policy of partner models, termed as model bias, which can significantly impact performance when heavily relying on partner models. In order to achieve a trade-off between sample complexity and performance, a novel multi-agent model-based reinforcement learning algorithm called decentralized adaptive partner modeling (DAPM) is proposed, which utilizes fictitious self play (FSP) to construct partner models and update policies. Model bias is addressed by establishing an upper bound to restrict the usage of partner models. Coupled with that, an adaptive rollout approach is introduced, enabling real agents to dynamically communicate with partner models based on their quality, ensuring that agent performance can progressively improve with partner model samples. The effectiveness of DAPM is exhibited in two multi-agent tasks, showing that DAPM outperforms existing model-free algorithms in terms of partner sample complexity and training stability. Specifically, DAPM requires 28.5% fewer communications compared to the best baseline and exhibits reduced fluctuations in the learning curve, indicating superior performance.

KW - Fictitious self play

KW - Multi-agent reinforcement learning

KW - Partner modeling

KW - Partner sample complexity

UR - http://www.scopus.com/inward/record.url?scp=85190524393&partnerID=8YFLogxK

U2 - 10.1007/s40747-024-01421-3

DO - 10.1007/s40747-024-01421-3

M3 - Article

AN - SCOPUS:85190524393

SN - 2199-4536

VL - 10

SP - 4989

EP - 5004

JO - Complex and Intelligent Systems

JF - Complex and Intelligent Systems

IS - 4

ER -

Decentralized multi-agent cooperation via adaptive partner modeling

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this