Regularized Softmax Deep Multi-Agent <i>Q-</i>Learning

Pan, Ling, Rashid, Tabish, Peng, Bei ORCID: 0000-0003-0152-3180, Huang, Longbo and Whiteson, Shimon
(2021) Regularized Softmax Deep Multi-Agent <i>Q-</i>Learning. In: Thirty-fifth Conference on Neural Information Processing Systems, 2021-12-6 - 2021-12-14, Online.

[img] Text
sr_marl.pdf - Author Accepted Manuscript

Download (11MB) | Preview


Tackling overestimation in Q-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we empirically demonstrate that QMIX, a popular Q-learning algorithm for cooperative multiagent reinforcement learning (MARL), suffers from a more severe overestimation in practice than previously acknowledged, and is not mitigated by existing approaches. We rectify this with a novel regularization-based update scheme that penalizes large joint action-values that deviate from a baseline and demonstrate its effectiveness in stabilizing learning. Furthermore, we propose to employ a softmax operator, which we efficiently approximate in a novel way in the multiagent setting, to further reduce the potential overestimation bias. Our approach, Regularized Softmax (RES) Deep Multi-Agent Q-Learning, is general and can be applied to any Q-learning based MARL algorithm. We demonstrate that, when applied to QMIX, RES avoids severe overestimation and significantly improves performance, yielding state-of-the-art results in a variety of cooperative multi-agent tasks, including the challenging StarCraft II micromanagement benchmarks.

Item Type: Conference or Workshop Item (Unspecified)
Uncontrolled Keywords: multi-agent reinforcement learning, value factorization, overestimation
Divisions: Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User: Symplectic Admin
Date Deposited: 27 Oct 2021 08:30
Last Modified: 14 Oct 2023 23:24
Related URLs: