Hahn, Ernst Moritz, Perez, Mateo, Schewe, Sven ORCID: 0000-0002-9093-9518, Somenzi, Fabio, Trivedi, Ashutosh and Wojtczak, Dominik ORCID: 0000-0001-5560-0546
(2023)
Multi-objective ω-Regular Reinforcement Learning.
FORMAL ASPECTS OF COMPUTING, 35 (2).
pp. 1-24.
Abstract
<jats:p> The expanding role of reinforcement learning (RL) in safety-critical system design has promoted ω-automata as a way to express learning requirements—often non-Markovian—with greater ease of expression and interpretation than scalar reward signals. However, real-world sequential decision making situations often involve multiple, potentially conflicting, objectives. Two dominant approaches to express relative preferences over multiple objectives are: (1) <jats:italic>weighted preference</jats:italic> , where the decision maker provides scalar weights for various objectives, and (2) <jats:italic>lexicographic preference</jats:italic> , where the decision maker provides an order over the objectives such that any amount of satisfaction of a higher-ordered objective is preferable to any amount of a lower-ordered one. In this article, we study and develop RL algorithms to compute optimal strategies in Markov decision processes against multiple ω-regular objectives under weighted and lexicographic preferences. We provide a translation from multiple ω-regular objectives to a scalar reward signal that is both <jats:italic>faithful</jats:italic> (maximising reward means maximising probability of achieving the objectives under the corresponding preference) and <jats:italic>effective</jats:italic> (RL quickly converges to optimal strategies). We have implemented the translations in a formal reinforcement learning tool, <jats:sc>Mungojerrie</jats:sc> , and we present an experimental evaluation of our technique on benchmark learning problems. </jats:p>
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Multi-objective reinforcement learning, omega-regular objectives, lexicographic preference, weighted preference, automata-theoretic reinforcement learning |
Divisions: | Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science |
Depositing User: | Symplectic Admin |
Date Deposited: | 25 Sep 2023 14:43 |
Last Modified: | 19 Oct 2023 09:34 |
DOI: | 10.1145/3605950 |
Open Access URL: | https://doi.org/10.1145/3605950 |
Related URLs: | |
URI: | https://livrepository.liverpool.ac.uk/id/eprint/3173031 |