Improved Representations for Cooperative Multi-Agent Reinforcement Learning

Castellini, Jacopo (2022) Improved Representations for Cooperative Multi-Agent Reinforcement Learning. Doctor of Philosophy thesis, University of Liverpool.

Text
201289569_Jun2022.pdf - Unspecified
Download (6MB) | Preview

Abstract

Multi-agent systems [33, 136] are an ubiquitous presence in our everyday life: our entire society could be seen as a huge multi-agent system in which each individual has to perform in an environment populated by other entities, each motivated by its own goals and objectives. In a cooperative system, all of these entities act towards a common goal. This setting has gained a lot of popularity in the AI research community, as many real-world situations can naturally be modelled as such [150, 177, 151, 71, 109, 23]. Still, optimally solving such systems remains a challenging problem. Multi-agent reinforcement learning (MARL) is one of the most employed techniques used to tackle these: agents learn how to behave by repeatedly interacting with their own environment. Although major key steps have been taken in this direction, some fundamental issues arising from the presence of multiple agents that learn and act together remain. In this work, two such aspects are addressed: team representation and the multi-agent credit assignment problem. The former is about the way in which a system designer has to represent and learn the team of agents: on one hand, representing the whole team as a single centralized entity may seem compelling, but this solution is not adaptable to scale to larger systems. On the other hand, learning each agent independently from the others solves that issue [146, 83, 24, 171], but introduces non-stationarity in the agent learning experiences due to the now ignored presence of the other agents. In this work, the focus is on factorization techniques [50, 53, 52], as a middle ground between these two extremes: this idea has recently gained a major interest and served as the basis for many recent deep MARL algorithms [139, 115, 135, 158]. Although well performing in practice, all of these methods only focused on single-agent decompositions, leaving the idea of “higher-order” factorizations almost unexplored. Moreover, although factorizations are widely considered capable of improving performances over the two approaches detailed above, no wide investigation of the real merits or general applicability of factored techniques has been conducted so far. This work fills the gap by investigating a wide array of factored methods on a diverse set of cooperative scenarios, to assess their performance both in terms of accuracy of the represented functions and action selection. About the multi-agent credit assignment problem [20, 94, 178, 168] instead, many techniques have been proposed, including difference rewards [169, 168] (one of the most popular family of algorithms used to tackle such problems), but few have been extended to the deep MARL framework. One of such applications is Counterfactual Multi-Agent Policy Gradients (COMA) [40], that employs difference rewards to provide each agent with an individual signal to learn from. This algorithm have however proved to perform poorly in practice [160, 181, 63, 82]: the reason for such poor performances has to be identified in the centralized critic used by COMA to estimate such values. Such a critic is difficult to learn because of compounding factors, and thus may provide inaccurate or wrong values to the agents. For this reason, in this work two novel algorithms, named Dr.Reinforce and Dr.ReinforceR, are proposed. These avoid the above difficulties by applying difference rewards on the system reward function, either by accessing it directly or by learning it with a centralized network. The results show the improvements over COMA, pointing out how learning a more accurate representation is a key factor towards a wider applicability of difference rewards to solve the multi-agent credit assignment problem. The original contributions presented in this thesis could deepen the understanding of multi-agent reinforcement learning, by providing evidences that carefully designed alternative representations are indeed useful in improving the learning of multiple agents and help in contrasting some fundamental problems that characterize this kind of systems. Moreover, novel techniques could build upon the solutions investigated in this work, and further improve performance or allow us to tackle increasingly complex problems.

Item Type:	Thesis (Doctor of Philosophy)
Divisions:	Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User:	Symplectic Admin
Date Deposited:	04 Aug 2022 08:43
Last Modified:	18 Jan 2023 20:56
DOI:	10.17638/03158406
Supervisors:	Oliehoek, Frans A Savani, Rahul ORCID: 0000-0003-1262-7831
URI:	https://livrepository.liverpool.ac.uk/id/eprint/3158406