Hahn, Ernst Moritz, Perez, Mateo, Schewe, Sven
ORCID: 0000-0002-9093-9518, Somenzi, Fabio, Trivedi, Ashutosh
ORCID: 0000-0001-9346-0126 and Wojtczak, Dominik
ORCID: 0000-0001-5560-0546
(2021)
Model-Free Reinforcement Learning for Branching Markov Decision Processes.
COMPUTER AIDED VERIFICATION, PT II, CAV 2021, 12760.
pp. 651-673.
ISSN 0302-9743, 1611-3349
|
Text
2106.06777v1.pdf - Author Accepted Manuscript Download (928kB) | Preview |
Abstract
We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.
| Item Type: | Article |
|---|---|
| Additional Information: | to appear in CAV 2021 |
| Uncontrolled Keywords: | cs.LG, cs.LG, cs.LO, cs.SY, eess.SY |
| Divisions: | Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science |
| Depositing User: | Symplectic Admin |
| Date Deposited: | 14 Oct 2021 07:31 |
| Last Modified: | 06 Dec 2024 21:19 |
| DOI: | 10.1007/978-3-030-81688-9_30 |
| Related URLs: | |
| URI: | https://livrepository.liverpool.ac.uk/id/eprint/3140282 |
Altmetric
Altmetric