Solving Multi-agent MDPs Optimally with Conditional Return Graphs

(2015) Solving Multi-agent MDPs Optimally with Conditional Return Graphs. In: The Tenth AAMAS Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM), Istanbul, Turkey. (Submitted)

WarningThere is a more recent version of this item available.
[img] Text

Download (594kB)


In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate in order find an optimal joint policy that maximises joint value. Typical solution al- gorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP setting (MMDP) such structure is not present. We propose a new optimal solver for so-called TI-MMDPs, where agents can only af- fect their local state, while their value may depend on the state of others. We decompose the returns into local returns per agent that we represent compactly in a conditional re- turn graph (CRG). Using CRGs the value of a joint policy as well as bounds on the value of partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than the avail- able alternatives and is able to find solutions to problems previously considered unsolvable.

Item Type: Conference or Workshop Item (Paper)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Depositing User: Symplectic Admin
Date Deposited: 20 Oct 2015 07:38
Last Modified: 31 Mar 2016 12:09

Available Versions of this Item

Repository Staff Access