Publication Details
Overview
 
 
Steckelmacher, Denis, Ann Nowé
 

Proceedings of the 2024 Benelux Conference on Artificial Intelligence

Contribution To Book Anthology

Abstract 

Offline Reinforcement Learning allows to learn a controller for a system from a history of states, actions and rewards, without requiring to interact with the system or a simulator of it. Current Offline RL approaches mainly build on Off-policy RL, such as Q-Learning or TD3, with small extensions to prevent the algorithm from diverging due to the inability to try actions in real time. In this paper, we observe that these incremental approaches mostly lead to low-quality and untrustable policies. We then propose an Offline RL method built from the ground up, based on inferring a discrete-state and discrete-action MDP from the continuous states and actions in the dataset, and then solving the discrete MDP with Value Iteration. Our empirical evaluation shows the promises of our approach, and calls for more research in Offline RL with dedicated algorithms.

Reference 
 
 
Link VUB