Palmer, Gregory, Tuyls, Karl, Bloembergen, Daan and Savani, Rahul
ORCID: 0000-0003-1262-7831
(2018)
Lenient Multi-Agent Deep Reinforcement Learning
In: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1, Stockholm.
|
Text
Lenient_Multi_Agent_Deep_Reinforcement_Learning___AAMAS_Camera_Ready_Version.pdf - Published version Download (1MB) |
Abstract
Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated when agents update their policies in parallel \citefoerster2017stabilising. In this work we apply leniency \citepanait2006lenient to MA-DRL. Lenient agents map state-action pairs to decaying temperature values that control the amount of leniency applied towards negative policy updates that are sampled from the ERM. This introduces optimism in the value-function update, and has been shown to facilitate cooperation in tabular fully-cooperative multi-agent reinforcement learning problems. We evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN (HDQN) algorithm \citeomidshafiei2017deep as well as a modified version we call scheduled -HDQN, that uses average reward learning near terminal states. Evaluations take place in extended variations of the Coordinated Multi-Agent Object Transportation Problem (CMOTP) \citebucsoniu2010multi. We find that LDQN agents are more likely to converge to the optimal policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN agents.
| Item Type: | Conference Item (Unspecified) |
|---|---|
| Uncontrolled Keywords: | 46 Information and Computing Sciences, 4602 Artificial Intelligence, 4611 Machine Learning, Generic health relevance |
| Depositing User: | Symplectic Admin |
| Date Deposited: | 03 May 2018 15:14 |
| Last Modified: | 23 May 2026 01:31 |
| DOI: | 10.65109/qdcv6054 |
| Related Websites: | |
| URI: | https://livrepository.liverpool.ac.uk/id/eprint/3020821 |
| Disclaimer: | The University of Liverpool is not responsible for content contained on other websites from links within repository metadata. Please contact us if you notice anything that appears incorrect or inappropriate. |
Altmetric
Altmetric