Publication Details
Overview
 
 
Plisnier, Helene
 

Thesis

Abstract 

Reinforcement Learning (RL) is a Machine Learning method mimicking the way humans learn to perform new tasks, specifically when it involves a great amount of trial and error. By nature, it is a progressive process that requires many interactions with the environment, and therefore time, before it can exhibit a satisfactory behavior. Consequently, an important challenge faced by current RL techniques is sample-efficiency. While most approaches to reduce the amount of samples needed focus on extracting a maximum amount of information out of each sample, we explore ways to improve the exploration strategy of the learner. Pushing the learning agent towards fruitful areas of the search space, and preventing it from wasting its time in undesirable areas, helps the agent reach a good policy faster and more efficiently. We present the Actor-Advisor, a general-purpose Policy Shaping method, allowing an external advisory policy to influence the actions selected by an RL agent. We extend our main contribution to a wide range of settings, such as discrete and continuous actions spaces, using on or off-policy Reinforcement Learning algorithms. We design the learning correction to let Policy Gradient-based methods benefit from off-policy external guidance, despite their strong on-policyness. We evaluate the Actor-Advisor in two important RL sub-fields: learning from human intervention, and Transfer Learning. Although almost any source can be used as an advisor of an RL agent following the Actor-Advisor framework, the focus of this thesis is applying the Actor-Advisor to several novel Transfer Learning problems. Transfer Learning is resolutely related to sample-efficiency, since it aims at making the learning of new tasks faster by smartly reusing knowledge acquired in previous tasks. Finally, we introduce Self-Transfer, a learning trick inspired by Transfer Learning, in which an RL agent can easily improve its sample-efficiency by using an advisor pre-trained for a short while on the same task. We hope that our contributions will help promote the use of Reinforcement Learning methods in future real-life problems.

Reference 
 
 
VUB