Decentralized multi-agent cooperation via adaptive partner modeling



Xu, Chenhang, Wang, Jia, Zhu, Xiaohui ORCID: 0000-0003-1024-5442, Yue, Yong, Zhou, Weifeng, Liang, Zhixuan and Wojtczak, Dominik
(2024) Decentralized multi-agent cooperation via adaptive partner modeling. Complex & Intelligent Systems, 10 (4). pp. 4989-5004.

Access the full-text of this item by clicking on the Open Access link.

Abstract

<jats:title>Abstract</jats:title><jats:p>Multi-agent reinforcement learning encounters a non-stationary challenge, where agents concurrently update their policies, leading to changes in the environment. Existing approaches have tackled this challenge through communication among agents to obtain their partners’ actions, but this introduces computational complexity known as partner sample complexity. An alternative approach is to develop partner models that generate samples instead of direct communication to mitigate this complexity. However, a discrepancy arises between the real policies distribution and the policy of partner models, termed as model bias, which can significantly impact performance when heavily relying on partner models. In order to achieve a trade-off between sample complexity and performance, a novel multi-agent model-based reinforcement learning algorithm called decentralized adaptive partner modeling (DAPM) is proposed, which utilizes fictitious self play (FSP) to construct partner models and update policies. Model bias is addressed by establishing an upper bound to restrict the usage of partner models. Coupled with that, an adaptive rollout approach is introduced, enabling real agents to dynamically communicate with partner models based on their quality, ensuring that agent performance can progressively improve with partner model samples. The effectiveness of DAPM is exhibited in two multi-agent tasks, showing that DAPM outperforms existing model-free algorithms in terms of partner sample complexity and training stability. Specifically, DAPM requires 28.5% fewer communications compared to the best baseline and exhibits reduced fluctuations in the learning curve, indicating superior performance.</jats:p>

Item Type: Article
Uncontrolled Keywords: 46 Information and Computing Sciences, 4602 Artificial Intelligence, 4611 Machine Learning, Bioengineering
Depositing User: Symplectic Admin
Date Deposited: 04 Jul 2024 15:41
Last Modified: 25 Jul 2024 11:24
DOI: 10.1007/s40747-024-01421-3
Open Access URL: https://doi.org/10.1007/s40747-024-01421-3
Related URLs:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3182652