Hahn, Ernst Moritz ORCID: 0000-0002-9348-7684, Perez, Mateo ORCID: 0000-0003-4220-3212, Schewe, Sven ORCID: 0000-0002-9093-9518, Somenzi, Fabio ORCID: 0000-0002-2085-2003, Trivedi, Ashutosh ORCID: 0000-0001-9346-0126 and Wojtczak, Dominik ORCID: 0000-0001-5560-0546
(2023)
Mungojerrie: Linear-Time Objectives in Model-Free Reinforcement Learning.
In:
Lecture Notes in Computer Science.
Lecture Notes in Computer Science, 13993
.
Springer Nature Switzerland, pp. 527-545.
ISBN 9783031308222
Abstract
<jats:title>Abstract</jats:title><jats:p>Mungojerrie is an extensible tool that provides a framework to translate linear-time objectives into reward for reinforcement learning (RL). The tool provides convergent RL algorithms for stochastic games, reference implementations of existing reward translations for <jats:inline-formula><jats:alternatives><jats:tex-math>$$\omega $$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ω</mml:mi> </mml:math></jats:alternatives></jats:inline-formula>-regular objectives, and an internal probabilistic model checker for <jats:inline-formula><jats:alternatives><jats:tex-math>$$\omega $$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ω</mml:mi> </mml:math></jats:alternatives></jats:inline-formula>-regular objectives. This functionality is modular and operates on shared data structures, which enables fast development of new translation techniques. Mungojerrie supports finite models specified in PRISM and <jats:inline-formula><jats:alternatives><jats:tex-math>$$\omega $$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ω</mml:mi> </mml:math></jats:alternatives></jats:inline-formula>-automata specified in the HOA format, with an integrated command line interface to external linear temporal logic translators. Mungojerrie is distributed with a set of benchmarks for <jats:inline-formula><jats:alternatives><jats:tex-math>$$\omega $$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ω</mml:mi> </mml:math></jats:alternatives></jats:inline-formula>-regular objectives in RL.</jats:p>
Item Type: | Book Section |
---|---|
Uncontrolled Keywords: | 46 Information and Computing Sciences, 4602 Artificial Intelligence, 4611 Machine Learning, Basic Behavioral and Social Science, Behavioral and Social Science |
Divisions: | Faculty of Science and Engineering > School of Electrical Engineering, Electronics and Computer Science |
Depositing User: | Symplectic Admin |
Date Deposited: | 20 Jul 2023 12:44 |
Last Modified: | 12 Aug 2024 17:37 |
DOI: | 10.1007/978-3-031-30823-9_27 |
Open Access URL: | https://link.springer.com/chapter/10.1007/978-3-03... |
Related URLs: | |
URI: | https://livrepository.liverpool.ac.uk/id/eprint/3171810 |