Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives



Hahn, Ernst Moritz, Perez, Mateo, Schewe, Sven ORCID: 0000-0002-9093-9518, Somenzi, Fabio, Trivedi, Ashutosh and Wojtczak, Dominik ORCID: 0000-0001-5560-0546
(2020) Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives. In: ATVA, 2020-10-19 - 2020-10-23.

[img] Text
ATVA2020.pdf - Author Accepted Manuscript

Download (387kB) | Preview

Abstract

Omega-regular properties—specified using linear time temporal logic or various forms of omega-automata—find increasing use in specifying the objectives of reinforcement learning (RL). The key problem that arises is that of faithful and effective translation of the objective into a scalar reward for model-free RL. A recent approach exploits Büchi automata with restricted nondeterminism to reduce the search for an optimal policy for an -regular property to that for a simple reachability objective. A possible drawback of this translation is that reachability rewards are sparse, being reaped only at the end of each episode. Another approach reduces the search for an optimal policy to an optimization problem with two interdependent discount parameters. While this approach provides denser rewards than the reduction to reachability, it is not easily mapped to off-the-shelf RL algorithms. We propose a reward scheme that reduces the search for an optimal policy to an optimization problem with a single discount parameter that produces dense rewards and is compatible with off-the-shelf RL algorithms. Finally, we report an experimental comparison of these and other reward schemes for model-free RL with omega-regular objectives.

Item Type: Conference or Workshop Item (Unspecified)
Uncontrolled Keywords: Basic Behavioral and Social Science, Behavioral and Social Science
Depositing User: Symplectic Admin
Date Deposited: 23 Jul 2020 08:11
Last Modified: 15 Mar 2024 00:25
DOI: 10.1007/978-3-030-59152-6_6
Related URLs:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3094897