Florent Delgrange, Ann Nowé, Guillermo A. PĂ©rez
We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework introduced by Gelada et al. to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.
Delgrange, F, Nowé, A & Pérez, GA 2022, Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes. in Proceedings of the AAAI Conference on Artificial Intelligence: Vol. 36 No. 6: AAAI-22 Technical Tracks 6. First edn, vol. 36, Proceedings of the AAAI Conference on Artificial Intelligence, no. 6, vol. 36, AAAI Press, Palo Alto, California USA, pp. 6497-6505, 36th AAAI Conference on Artificial Intelligence, 22/02/22. https://doi.org/10.1609/aaai.v36i6.20602
Delgrange, F., Nowé, A., & Pérez, G. A. (2022). Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes. In Proceedings of the AAAI Conference on Artificial Intelligence: Vol. 36 No. 6: AAAI-22 Technical Tracks 6 (First ed., Vol. 36, pp. 6497-6505). (Proceedings of the AAAI Conference on Artificial Intelligence; Vol. 36, No. 6). AAAI Press. https://doi.org/10.1609/aaai.v36i6.20602
@inproceedings{91d687742e234d04a7fec29a7ad595f5,
title = "Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes",
abstract = " We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework introduced by Gelada et al. to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model. ",
keywords = "Machine Learning, Artificial Intelligence, Formal Methods, Reinforcement Learning, Knowledge Representation And Reasoning, Reasoning Under Uncertainty",
author = "Florent Delgrange and Ann Now{\'e} and P{\'e}rez, {Guillermo A.}",
note = "Funding Information: This research received funding from the Flemish Government (AI Research Program) and was supported by the DESCARTES iBOF project. G.A. Perez is also supported by the Belgian FWO âSAILorâ project (G030020N). Publisher Copyright: Copyright {\textcopyright} 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 36th AAAI Conference on Artificial Intelligence, AAAI ; Conference date: 22-02-2022 Through 01-03-2022",
year = "2022",
month = jun,
day = "28",
doi = "10.1609/aaai.v36i6.20602",
language = "English",
isbn = "1-57735-876-7",
volume = "36",
series = "Proceedings of the AAAI Conference on Artificial Intelligence",
publisher = "AAAI Press",
number = "6",
pages = "6497--6505",
booktitle = "Proceedings of the AAAI Conference on Artificial Intelligence",
edition = "First",
url = "https://aaai.org/Conferences/AAAI-22/",
}