Mahmoud Ahmed Hassan Mohamed Elbarbari, Florent Delgrange, Ivo Vervlimmeren, Kyriakos Efthymiadis, Bram Vanderborght, Ann Nowe
Reinforcement Learning (RL) enables artificial agents to learn through direct interaction with the environment. However, it usually does not scale up well to large problems due to its sampling inefficiency. Reward Shaping is a well-established approach that allows for more efficient learning by incorporating domain knowledge in RL agents via supplementary rewards. In this work we propose a novel methodology that automatically generates reward shaping functions from user-provided Linear Temporal Logic on finite traces (LTLf) formulas. LTLf in our work serves as a rich language that allows the user to communicate domain knowledge to the learning agent. In both single and multi-agent settings, we demonstrate that our approach performs at least as well as the baseline approach while providing essential advantages in terms of flexibility and ease of use. We elaborate on some of these advantages empirically by demonstrating that our approach can handle domain knowledge with different levels of accuracy, and provides the user with the flexibility to express aspects of uncertainty in the provided advice.
Elbarbari, MAHM, Delgrange, F, Vervlimmeren, I, Efthymiadis, K, Vanderborght, B & Nowe, A 2025, 'A Framework for Flexibly Guiding Learning Agents', Neural Computing & Applications, vol. 37, no. 19, 1, pp. 13101-13117. https://doi.org/10.1007/s00521-022-07396-x
Elbarbari, M. A. H. M., Delgrange, F., Vervlimmeren, I., Efthymiadis, K., Vanderborght, B., & Nowe, A. (2025). A Framework for Flexibly Guiding Learning Agents. Neural Computing & Applications, 37(19), 13101-13117. Article 1. https://doi.org/10.1007/s00521-022-07396-x
@article{bf4975ae0cc04ec3a3f2403a8a91072e,
title = "A Framework for Flexibly Guiding Learning Agents",
abstract = "Reinforcement Learning (RL) enables artificial agents to learn through direct interaction with the environment. However, it usually does not scale up well to large problems due to its sampling inefficiency. Reward Shaping is a well-established approach that allows for more efficient learning by incorporating domain knowledge in RL agents via supplementary rewards. In this work we propose a novel methodology that automatically generates reward shaping functions from user-provided Linear Temporal Logic on finite traces (LTLf) formulas. LTLf in our work serves as a rich language that allows the user to communicate domain knowledge to the learning agent. In both single and multi-agent settings, we demonstrate that our approach performs at least as well as the baseline approach while providing essential advantages in terms of flexibility and ease of use. We elaborate on some of these advantages empirically by demonstrating that our approach can handle domain knowledge with different levels of accuracy, and provides the user with the flexibility to express aspects of uncertainty in the provided advice.",
keywords = "Reinforcement Learning, Reward Shaping, Linear Temporal Logic on finite traces, Multi-agent Systems",
author = "Elbarbari, {Mahmoud Ahmed Hassan Mohamed} and Florent Delgrange and Ivo Vervlimmeren and Kyriakos Efthymiadis and Bram Vanderborght and Ann Nowe",
note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022.",
year = "2025",
month = jul,
doi = "10.1007/s00521-022-07396-x",
language = "English",
volume = "37",
pages = "13101--13117",
journal = "Neural Computing & Applications",
issn = "0941-0643",
publisher = "Springer London",
number = "19",
}