Reinforcement Learning usually does not scale up well to large problems. It typically takes a Reinforcement Learning agent many trials until it can reach a satisfying policy. A main contributing factor to this problem is the fact that Reinforcement Learning is often used for learning exclusively by means of trial and error. There has been much work that addresses incorporating domain knowledge in Reinforcement Learning to allow more efficient learning. Reward shaping is a well-established method to incorporate domain knowledge in Reinforcement Learning by providing the learning agent with a supplementary reward. In this work we propose a novel methodology that automatically generates reward shaping functions from user-provided Linear Temporal Logic formulas. Linear Temporal Logic in our work serves as a rich, yet compact, language that allows the user to express the domain knowledge with minimum effort. Linear Temporal Logic is also rather easy to be expressed in natural language which makes it easier for non-expert users. We use the flag collection domain to demonstrate empirically the increase in both the convergence speed and the quality of the learned policy despite the minimum domain knowledge provided.
Elbarbari, M, Efthymiadis, K, Vanderborght, B & Nowe, A 2021, LTLf-based Reward Shaping for Reinforcement Learning. in Proceedings of the Adaptive and Learning Agents Workshop 2021 (ALA2021) at AAMAS. Adaptive and Learning Agents Workshop 2021 , London, United Kingdom, 3/05/21. <https://ala2021.vub.ac.be/papers/ALA2021_paper_55.pdf>
Elbarbari, M., Efthymiadis, K., Vanderborght, B., & Nowe, A. (2021). LTLf-based Reward Shaping for Reinforcement Learning. In Proceedings of the Adaptive and Learning Agents Workshop 2021 (ALA2021) at AAMAS https://ala2021.vub.ac.be/papers/ALA2021_paper_55.pdf
@inproceedings{e0f6c995777d4b30a90e60368c234c05,
title = "LTLf-based Reward Shaping for Reinforcement Learning",
abstract = "Reinforcement Learning usually does not scale up well to large problems. It typically takes a Reinforcement Learning agent many trials until it can reach a satisfying policy. A main contributing factor to this problem is the fact that Reinforcement Learning is often used for learning exclusively by means of trial and error. There has been much work that addresses incorporating domain knowledge in Reinforcement Learning to allow more efficient learning. Reward shaping is a well-established method to incorporate domain knowledge in Reinforcement Learning by providing the learning agent with a supplementary reward. In this work we propose a novel methodology that automatically generates reward shaping functions from user-provided Linear Temporal Logic formulas. Linear Temporal Logic in our work serves as a rich, yet compact, language that allows the user to express the domain knowledge with minimum effort. Linear Temporal Logic is also rather easy to be expressed in natural language which makes it easier for non-expert users. We use the flag collection domain to demonstrate empirically the increase in both the convergence speed and the quality of the learned policy despite the minimum domain knowledge provided.",
keywords = "Reinforcement Learning, Reward Shaping, Linear Temporal Logic on finite traces",
author = "Mahmoud Elbarbari and Kyriakos Efthymiadis and Bram Vanderborght and Ann Nowe",
year = "2021",
month = apr,
day = "27",
language = "English",
booktitle = "Proceedings of the Adaptive and Learning Agents Workshop 2021 (ALA2021) at AAMAS",
note = "Adaptive and Learning Agents Workshop 2021 : at AAMAS , ALA2021 ; Conference date: 03-05-2021 Through 04-05-2021",
url = "https://ala2021.vub.ac.be",
}