Offline Reinforcement Learning allows to learn a controller for a system from a history of states, actions and rewards, without requiring to interact with the system or a simulator of it. Current Offline RL approaches mainly build on Off-policy RL, such as Q-Learning or TD3, with small extensions to prevent the algorithm from diverging due to the inability to try actions in real time. In this paper, we observe that these incremental approaches mostly lead to low-quality and untrustable policies. We then propose an Offline RL method built from the ground up, based on inferring a discrete-state and discrete-action MDP from the continuous states and actions in the dataset, and then solving the discrete MDP with Value Iteration. Our empirical evaluation shows the promises of our approach, and calls for more research in Offline RL with dedicated algorithms.
Steckelmacher, D & Nowe, A 2024, Trustworthy and Explainable Offline Reinforcement Learning by Inferring a Discrete-State Discrete-Action MDP from a Continous-State Continuous-Action dataset . in Proceedings of the 2024 Benelux Conference on Artificial Intelligence. BNAIC Proceedings, Benelux Association for Artificial Intelligence (BNVKI-AIABN), BNAIC/BeNeLearn 2024: Joint International Scientific Conferences on AI and Machine Learning, Utrecht, Netherlands, 18/11/24 . < https://bnaic2024.sites.uu.nl/wp-content/uploads/sites/986/2024/10/Trustworthy-and-Explainable-Offline-Reinforcement-Learning-by-Inferring-a-Discrete-State-Discrete-Action-MDP-from-a-Continous-State-Continuous-Action-dataset.pdf >
Steckelmacher, D. , & Nowe, A. (2024). Trustworthy and Explainable Offline Reinforcement Learning by Inferring a Discrete-State Discrete-Action MDP from a Continous-State Continuous-Action dataset . In Proceedings of the 2024 Benelux Conference on Artificial Intelligence (BNAIC Proceedings). Benelux Association for Artificial Intelligence (BNVKI-AIABN). https://bnaic2024.sites.uu.nl/wp-content/uploads/sites/986/2024/10/Trustworthy-and-Explainable-Offline-Reinforcement-Learning-by-Inferring-a-Discrete-State-Discrete-Action-MDP-from-a-Continous-State-Continuous-Action-dataset.pdf
@inproceedings{6ad805b16a71446d8dbc5cfc8e16f788,
title = " Trustworthy and Explainable Offline Reinforcement Learning by Inferring a Discrete-State Discrete-Action MDP from a Continous-State Continuous-Action dataset " ,
abstract = " Offline Reinforcement Learning allows to learn a controller for a system from a history of states, actions and rewards, without requiring to interact with the system or a simulator of it. Current Offline RL approaches mainly build on Off-policy RL, such as Q-Learning or TD3, with small extensions to prevent the algorithm from diverging due to the inability to try actions in real time. In this paper, we observe that these incremental approaches mostly lead to low-quality and untrustable policies. We then propose an Offline RL method built from the ground up, based on inferring a discrete-state and discrete-action MDP from the continuous states and actions in the dataset, and then solving the discrete MDP with Value Iteration. Our empirical evaluation shows the promises of our approach, and calls for more research in Offline RL with dedicated algorithms. " ,
author = " Denis Steckelmacher and Ann Nowe " ,
year = " 2024 " ,
month = nov,
day = " 18 " ,
language = " English " ,
series = " BNAIC Proceedings " ,
publisher = " Benelux Association for Artificial Intelligence (BNVKI-AIABN) " ,
booktitle = " Proceedings of the 2024 Benelux Conference on Artificial Intelligence " ,
note = " BNAIC/BeNeLearn 2024: Joint International Scientific Conferences on AI and Machine Learning, BNAIC/BeNeLearn 2024 Conference date: 18-11-2024 Through 20-11-2024 " ,
url = " https://bnaic2024.sites.uu.nl/ " ,
}