Model-free reinforcement learning for stochastic parity games



Hahn, EM, Perez, M, Schewe, S ORCID: 0000-0002-9093-9518, Somenzi, F, Trivedi, A and Wojtczak, D ORCID: 0000-0001-5560-0546
(2020) Model-free reinforcement learning for stochastic parity games. In: CONCUR 2020, 2020-9-1 - 2020-9-4.

[thumbnail of Model_Free_Reinforcement_Learning_for_Stochastic_Parity_Games.pdf] Text
Model_Free_Reinforcement_Learning_for_Stochastic_Parity_Games.pdf - Author Accepted Manuscript

Download (578kB) | Preview

Abstract

This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena - a stochastic game graph with unknown but fixed probability distributions - to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter ε, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter ε tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 112-player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductions.

Item Type: Conference or Workshop Item (Unspecified)
Depositing User: Symplectic Admin
Date Deposited: 17 Jul 2020 10:35
Last Modified: 08 Jun 2024 13:06
DOI: 10.4230/LIPIcs.CONCUR.2020.21
URI: https://livrepository.liverpool.ac.uk/id/eprint/3094325