ETROVUB

Senne Deproost, Denis Steckelmacher, Ann Nowe

Chapter in Book/ Report/ Conference proceeding

Abstract ■

With Deep Reinforcement Learning (DRL) being increasingly considered for the control of real-world systems, the lack of transparency of the neural network at the core of RL becomes a concern. Programmatic Reinforcement Learning (PRL) is able to create representations of this black box in the form of source code, not only increasing the explainability of the controller but also allowing for user adaptations. However, these methods focus on distilling a black box policy into a program and do so after learning using the Mean Squared Error between produced and wanted behaviour, discarding other elements of the RL algorithm. The distilled policy may therefore perform significantly worse than the black box learned policy. In this paper, we propose to directly learn a program as the policy of an RL agent. We build on TD3 and use its critics as the basis of the objective function of a genetic algorithm that syntheses the program. Our approach builds the program during training, as opposed to after the fact. This steers the program to actual high rewards, instead of a simple Mean Squared Error. Also, our approach leverages the TD3 critics to achieve high sample-efficiency, as opposed to pure genetic methods that rely on Monte-Carlo evaluations. Our experiments demonstrate the validity, explainability and sample-efficiency of our approach in a simple gridworld environment.

Reference ■

Deproost, S, Steckelmacher, D & Nowe, A 2024, Human-readable programs as the actor of a Reinforcement Learning agent using Critic-Moderated Evolution. in Proceedings of the 2024 Benelux Conference on Artificial Intelligence. BNAIC Proceedings, Benelux Association for Artificial Intelligence (BNVKI-AIABN), pp. 1-18, BNAIC/BeNeLearn 2024: Joint International Scientific Conferences on AI and Machine Learning, Utrecht, Netherlands, 18/11/24. <https://bnaic2024.sites.uu.nl/wp-content/uploads/sites/986/2024/10/Human-readable-programs-as-the-actor-of-a-Reinforcement-Learning-agent-using-critic-moderated-evolution.pdf>

Deproost, S., Steckelmacher, D., & Nowe, A. (2024). Human-readable programs as the actor of a Reinforcement Learning agent using Critic-Moderated Evolution. In Proceedings of the 2024 Benelux Conference on Artificial Intelligence (pp. 1-18). (BNAIC Proceedings). Benelux Association for Artificial Intelligence (BNVKI-AIABN). https://bnaic2024.sites.uu.nl/wp-content/uploads/sites/986/2024/10/Human-readable-programs-as-the-actor-of-a-Reinforcement-Learning-agent-using-critic-moderated-evolution.pdf

@inproceedings{06925a1f1ec141c48d432ca36e4d988f,
title = "Human-readable programs as the actor of a Reinforcement Learning agent using Critic-Moderated Evolution",
abstract = "With Deep Reinforcement Learning (DRL) being increasingly considered for the control of real-world systems, the lack of transparency of the neural network at the core of RL becomes a concern. Programmatic Reinforcement Learning (PRL) is able to create representations of this black box in the form of source code, not only increasing the explainability of the controller but also allowing for user adaptations. However, these methods focus on distilling a black box policy into a program and do so after learning using the Mean Squared Error between produced and wanted behaviour, discarding other elements of the RL algorithm. The distilled policy may therefore perform significantly worse than the black box learned policy. In this paper, we propose to directly learn a program as the policy of an RL agent. We build on TD3 and use its critics as the basis of the objective function of a genetic algorithm that syntheses the program. Our approach builds the program during training, as opposed to after the fact. This steers the program to actual high rewards, instead of a simple Mean Squared Error. Also, our approach leverages the TD3 critics to achieve high sample-efficiency, as opposed to pure genetic methods that rely on Monte-Carlo evaluations. Our experiments demonstrate the validity, explainability and sample-efficiency of our approach in a simple gridworld environment.",
keywords = "Reinforcement Learning, Explainable AI",
author = "Senne Deproost and Denis Steckelmacher and Ann Nowe",
year = "2024",
month = nov,
day = "17",
language = "English",
series = "BNAIC Proceedings",
publisher = "Benelux Association for Artificial Intelligence (BNVKI-AIABN)",
pages = "1--18",
booktitle = "Proceedings of the 2024 Benelux Conference on Artificial Intelligence",
note = "BNAIC/BeNeLearn 2024: Joint International Scientific Conferences on AI and Machine Learning, BNAIC/BeNeLearn 2024 ; Conference date: 18-11-2024 Through 20-11-2024",
url = "https://bnaic2024.sites.uu.nl/, https://bnaic2024.sites.uu.nl",
}

Link