Multi-objective ω-Regular Reinforcement Learning



Hahn, Ernst Moritz, Perez, Mateo, Schewe, Sven ORCID: 0000-0002-9093-9518, Somenzi, Fabio, Trivedi, Ashutosh and Wojtczak, Dominik ORCID: 0000-0001-5560-0546
(2023) Multi-objective ω-Regular Reinforcement Learning. FORMAL ASPECTS OF COMPUTING, 35 (2). pp. 1-24.

Access the full-text of this item by clicking on the Open Access link.

Abstract

<jats:p> The expanding role of reinforcement learning (RL) in safety-critical system design has promoted ω-automata as a way to express learning requirements—often non-Markovian—with greater ease of expression and interpretation than scalar reward signals. However, real-world sequential decision making situations often involve multiple, potentially conflicting, objectives. Two dominant approaches to express relative preferences over multiple objectives are: (1) <jats:italic>weighted preference</jats:italic> , where the decision maker provides scalar weights for various objectives, and (2) <jats:italic>lexicographic preference</jats:italic> , where the decision maker provides an order over the objectives such that any amount of satisfaction of a higher-ordered objective is preferable to any amount of a lower-ordered one. In this article, we study and develop RL algorithms to compute optimal strategies in Markov decision processes against multiple ω-regular objectives under weighted and lexicographic preferences. We provide a translation from multiple ω-regular objectives to a scalar reward signal that is both <jats:italic>faithful</jats:italic> (maximising reward means maximising probability of achieving the objectives under the corresponding preference) and <jats:italic>effective</jats:italic> (RL quickly converges to optimal strategies). We have implemented the translations in a formal reinforcement learning tool, <jats:sc>Mungojerrie</jats:sc> , and we present an experimental evaluation of our technique on benchmark learning problems. </jats:p>

Item Type: Article
Uncontrolled Keywords: Multi-objective reinforcement learning, omega-regular objectives, lexicographic preference, weighted preference, automata-theoretic reinforcement learning
Divisions: Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science
Depositing User: Symplectic Admin
Date Deposited: 25 Sep 2023 14:43
Last Modified: 19 Oct 2023 09:34
DOI: 10.1145/3605950
Open Access URL: https://doi.org/10.1145/3605950
Related URLs:
URI: https://livrepository.liverpool.ac.uk/id/eprint/3173031