Palmer, Gregory, Tuyls, Karl, Bloembergen, Daan and Savani, Rahul ORCID: 0000-0003-1262-7831
(2018)
Lenient Multi-Agent Deep Reinforcement Learning.
.
Text
1707.04402v1.pdf - Submitted version Download (479kB) |
Abstract
Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated because agents update their policies in parallel [11]. In this work we apply leniency [23] to MA-DRL. Lenient agents map state-action pairs to decaying temperature values that control the amount of leniency applied towards negative policy updates that are sampled from the ERM. This introduces optimism in the value-function update, and has been shown to facilitate cooperation in tabular fully-cooperative multi-agent reinforcement learning problems. We evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN (HDQN) algorithm [22] as well as a modified version we call scheduled-HDQN, that uses average reward learning near terminal states. Evaluations take place in extended variations of the Coordinated Multi-Agent Object Transportation Problem (CMOTP) [8] which include fully-cooperative sub-tasks and stochastic rewards. We find that LDQN agents are more likely to converge to the optimal policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN agents.
Item Type: | Conference or Workshop Item (Unspecified) |
---|---|
Additional Information: | 9 pages, 6 figures, AAMAS2018 Conference Proceedings |
Uncontrolled Keywords: | Multi-Agent Deep Reinforcement Learning, Leniency |
Depositing User: | Symplectic Admin |
Date Deposited: | 16 Oct 2017 10:16 |
Last Modified: | 19 Jan 2023 06:53 |
Related URLs: | |
URI: | https://livrepository.liverpool.ac.uk/id/eprint/3009868 |