The deployment of Reinforcement Learning (RL) on physical robots still stumbles on several challenges, such as sample-efficiency, safety, reproducibility, cost, and software platforms. In this paper, we introduce MoveRL, an environment that exposes a standard OpenAI Gym interface, and allows any off-the-shelf RL agent to control a robot built on ROS, the Robot OS. ROS is the standard abstraction layer used by roboticists, and allows to observe and control both simulated and physical robots. By providing a bridge between the Gym and ROS, our environment allows an easy evaluation of RL algorithms in highly-accurate simulators, or real-world robots, without any change of software. In addition to a Gym-ROS bridge, our environment also leverages MoveIt, a state-of-the-art collision-aware robot motion planner, to prevent the RL agent from executing actions that would lead to a collision. Our experimental results show that a standard PPO agent is able to control a simulated commercial robot arm in an environment with moving obstacles, while almost perfectly avoiding collisions even in the early stages of learning. We also show that the use of MoveIt slightly increases the sample-efficiency of the RL agent. Combined, these results show that RL on robots is possible in a safe way, and that it is possible to leverage state-of-the-art robotic techniques to improve how an RL agent learns. We hope that our environment will allow more (future) RL algorithms to be evaluated on commercial robotic tasks.
Liu, G, De Winter, J, Vanderborght, B, Nowe, A & Steckelmacher, D 2022, MoveRL: To A Safer Robotic Reinforcement Learning Environment. in LA Leiva, C Pruski, R Markovich, A Najjar & C Schommer (eds), The 33rd Benelux Conference on Artificial Intelligence and the 30th Belgian Dutch Conference on Machine Learning (BNAIC/BENELEARN 2021): AI in ACTION Joint International Scientific Conferences on AI. vol. 1530, Communications in Computer and Information Science, vol. 1530, Springer, CCIS, pp. 239-253, 33rd Benelux Conference on Artificial Intelligence and 30th Belgian-Dutch Conference on Machine Learning, Luxembourg, 10/11/21. https://doi.org/10.1007/978-3-030-93842-0_14
Liu, G., De Winter, J., Vanderborght, B., Nowe, A., & Steckelmacher, D. (2022). MoveRL: To A Safer Robotic Reinforcement Learning Environment. In L. A. Leiva, C. Pruski, R. Markovich, A. Najjar, & C. Schommer (Eds.), The 33rd Benelux Conference on Artificial Intelligence and the 30th Belgian Dutch Conference on Machine Learning (BNAIC/BENELEARN 2021): AI in ACTION Joint International Scientific Conferences on AI (Vol. 1530, pp. 239-253). (Communications in Computer and Information Science; Vol. 1530). Springer. https://doi.org/10.1007/978-3-030-93842-0_14
@inproceedings{7d808fb4dfc34ee0babbbd14ba1d1bc9,
title = "MoveRL: To A Safer Robotic Reinforcement Learning Environment",
abstract = "The deployment of Reinforcement Learning (RL) on physical robots still stumbles on several challenges, such as sample-efficiency, safety, reproducibility, cost, and software platforms. In this paper, we introduce MoveRL, an environment that exposes a standard OpenAI Gym interface, and allows any off-the-shelf RL agent to control a robot built on ROS, the Robot OS. ROS is the standard abstraction layer used by roboticists, and allows to observe and control both simulated and physical robots. By providing a bridge between the Gym and ROS, our environment allows an easy evaluation of RL algorithms in highly-accurate simulators, or real-world robots, without any change of software. In addition to a Gym-ROS bridge, our environment also leverages MoveIt, a state-of-the-art collision-aware robot motion planner, to prevent the RL agent from executing actions that would lead to a collision. Our experimental results show that a standard PPO agent is able to control a simulated commercial robot arm in an environment with moving obstacles, while almost perfectly avoiding collisions even in the early stages of learning. We also show that the use of MoveIt slightly increases the sample-efficiency of the RL agent. Combined, these results show that RL on robots is possible in a safe way, and that it is possible to leverage state-of-the-art robotic techniques to improve how an RL agent learns. We hope that our environment will allow more (future) RL algorithms to be evaluated on commercial robotic tasks.",
keywords = "Robotic, Safe Reinforcement Learning, Path Planning",
author = "Gaoyuan Liu and {De Winter}, Joris and Bram Vanderborght and Ann Nowe and Denis Steckelmacher",
year = "2022",
doi = "10.1007/978-3-030-93842-0_14",
language = "English",
isbn = "978-3-030-93841-3",
volume = "1530",
series = "Communications in Computer and Information Science",
publisher = "Springer",
pages = "239--253",
editor = "Leiva, {Luis A.} and C{\'e}dric Pruski and R{\'e}ka Markovich and Amro Najjar and Christoph Schommer",
booktitle = "The 33rd Benelux Conference on Artificial Intelligence and the 30th Belgian Dutch Conference on Machine Learning (BNAIC/BENELEARN 2021)",
note = "33rd Benelux Conference on Artificial Intelligence and 30th Belgian-Dutch Conference on Machine Learning : 33rd Benelux Conference on Artificial Intelligence and 30th Belgian-Dutch Conference on Machine Learning, BNAIC/BeneLearn 2021 ; Conference date: 10-11-2021 Through 12-11-2021",
url = "https://bnaic2021.uni.lu/",
}