Multi-objective reinforcement learning for guaranteeing alignment with multiple values

Multi-objective reinforcement learning for guaranteeing alignment with multiple values ■

Manel Rodriguez-Soto, Roxana Radulescu, Juan A. Rodriguez-Aguilar, Maite Lopez-Sanchez, Ann Nowe

Abstract ■

In this paper, we address the problem of ensuring that autonomous learning agents are in alignment with multiple moral values. Specifically, we present the theoretical principles and algorithmic tools necessary for creating an environment where an agent is assured of learning a behaviour (or policy) that corresponds to multiple moral values while striving to achieve its individual objective. To address this value alignment problem, we adopt the Multi-Objective Reinforcement Learning framework and propose a novel algorithm that combines techniques from Multi-Objective Reinforcement Learning and Linear Programming. In addition to providing theoretical guarantees, we illustrate our value alignment process with an example involving an autonomous vehicle. Here, we demonstrate that the agent learns to behave in alignment with the ethical values of safety, achievement, and comfort. Additionally, we use a synthetic multi-objective environment generator to evaluate the computational costs associated with guaranteeing ethical learning in situations with an increasing numbers of values.