A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

Lanctot, Marc, Zambaldi, Vinicius, Gruslys, Audrunas, Lazaridou, Angeliki, Tuyls, Karl, Perolat, Julien, Silver, David and Graepel, Thore (2017) A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning. The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS), 30. pp. 4191-4204.

Text
1711.00832v2.pdf - Submitted version
Download (793kB)

Official URL: https://papers.nips.cc/paper/7007-a-unified-game-t...

Abstract

To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL). The simplest form is independent reinforcement learning (InRL), where each agent treats its experience as part of its (non-stationary) environment. In this paper, we first observe that policies learned using InRL can overfit to the other agents' policies during training, failing to sufficiently generalize during execution. We introduce a new metric, joint-policy correlation, to quantify this effect. We describe an algorithm for general MARL, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection. The algorithm generalizes previous ones such as InRL, iterated best response, double oracle, and fictitious play. Then, we present a scalable implementation which reduces the memory requirement using decoupled meta-solvers. Finally, we demonstrate the generality of the resulting policies in two partially observable settings: gridworld coordination games and poker.

Item Type:	Article
Additional Information:	Camera-ready copy of NIPS 2017 paper, including appendix
Uncontrolled Keywords:	cs.AI, cs.AI, cs.GT, cs.LG, cs.MA
Depositing User:	Symplectic Admin
Date Deposited:	24 Jan 2018 16:17
Last Modified:	19 Jan 2023 06:42
Related URLs:	Author Publisher
URI:	https://livrepository.liverpool.ac.uk/id/eprint/3016687