Model-Free Reinforcement Learning for Branching Markov Decision Processes



Hahn, Ernst Moritz, Perez, Mateo, Schewe, Sven ORCID: 0000-0002-9093-9518, Somenzi, Fabio, Trivedi, Ashutosh and Wojtczak, Dominik ORCID: 0000-0001-5560-0546
(2021) Model-Free Reinforcement Learning for Branching Markov Decision Processes. COMPUTER AIDED VERIFICATION, PT II, CAV 2021, 12760. pp. 651-673.

[thumbnail of 2106.06777v1.pdf] Text
2106.06777v1.pdf - Author Accepted Manuscript

Download (928kB) | Preview

Abstract

We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.

Item Type: Article
Additional Information: to appear in CAV 2021
Uncontrolled Keywords: cs.LG, cs.LG, cs.LO, cs.SY, eess.SY
Divisions: Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User: Symplectic Admin
Date Deposited: 14 Oct 2021 07:31
Last Modified: 18 Jan 2023 21:27
DOI: 10.1007/978-3-030-81688-9_30
Related URLs:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3140282