Diederik M. Roijers, Luisa M. Zintgraf, Pieter Libin, Mathieu Reymond, Eugenio Bargiacchi, Ann Nowe
In interactive multi-objective reinforcement learning (MORL), an agent has to simultaneously learn about the environment and the preferences of the user, in order to quickly zoom in on those decisions that are likely to be preferred by the user. In this paper we study interactive MORL in the context of multi-objective multi-armed bandits. Contrary to earlier approaches to interactive MORL that force the utility of the user to be expressed as a weighted sum of the values for each objective, we do not make such stringent a priori assumptions. Specifically, we not only allow non-linear preferences, but also obviate the need to specify the exact model class in the utility function must fall. To achieve this, we propose a new approach called Gaussian-process Utility Thompson Sampling (GUTS). GUTS employs parameterless Bayesian learning to allow any type of utility function, exploits monotonicity information, and limits the number of queries posed to the user by ensuring that questions are statistically significant. We show empirically that GUTS can learn non-linear preferences, and that the regret and number of queries posed to the user are highly sub-linear in the number of arm pulls. (A preliminary version of this work was presented at the ALA workshop in 2018 [20]).
Roijers, DM, Zintgraf, LM, Libin, P, Reymond, M, Bargiacchi, E & Nowe, A 2021, Interactive Multi-Objective Reinforcement Learning in Multi-Armed Bandits with Gaussian Process Utility Models. in F Hutter, K Kersting, J Lijffijt & I Valera (eds), ECML-PKDD 2020: Proceedings of the 2020 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Springer, ECML PKDD: Joint European Conference on Machine Learning and Knowledge Discovery in Databases
, Ghent, Belgium, 14/09/20. https://doi.org/10.1007/978-3-030-67664-3_28
Roijers, D. M., Zintgraf, L. M., Libin, P., Reymond, M., Bargiacchi, E., & Nowe, A. (2021). Interactive Multi-Objective Reinforcement Learning in Multi-Armed Bandits with Gaussian Process Utility Models. In F. Hutter, K. Kersting, J. Lijffijt, & I. Valera (Eds.), ECML-PKDD 2020: Proceedings of the 2020 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases Springer. https://doi.org/10.1007/978-3-030-67664-3_28
@inproceedings{3e59774ab5854c0fb919d58aee74dbbe,
title = "Interactive Multi-Objective Reinforcement Learning in Multi-Armed Bandits with Gaussian Process Utility Models",
abstract = "In interactive multi-objective reinforcement learning (MORL), an agent has to simultaneously learn about the environment and the preferences of the user, in order to quickly zoom in on those decisions that are likely to be preferred by the user. In this paper we study interactive MORL in the context of multi-objective multi-armed bandits. Contrary to earlier approaches to interactive MORL that force the utility of the user to be expressed as a weighted sum of the values for each objective, we do not make such stringent a priori assumptions. Specifically, we not only allow non-linear preferences, but also obviate the need to specify the exact model class in the utility function must fall. To achieve this, we propose a new approach called Gaussian-process Utility Thompson Sampling (GUTS). GUTS employs parameterless Bayesian learning to allow any type of utility function, exploits monotonicity information, and limits the number of queries posed to the user by ensuring that questions are statistically significant. We show empirically that GUTS can learn non-linear preferences, and that the regret and number of queries posed to the user are highly sub-linear in the number of arm pulls. (A preliminary version of this work was presented at the ALA workshop in 2018 [20]).",
author = "Roijers, {Diederik M.} and Zintgraf, {Luisa M.} and Pieter Libin and Mathieu Reymond and Eugenio Bargiacchi and Ann Nowe",
year = "2021",
doi = "10.1007/978-3-030-67664-3_28",
language = "English",
isbn = "978-3-030-67664-3",
editor = "Frank Hutter and Kristian Kersting and Jefrey Lijffijt and Isabel Valera",
booktitle = "ECML-PKDD 2020: Proceedings of the 2020 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases",
publisher = "Springer",
note = "ECML PKDD: Joint European Conference on Machine Learning and Knowledge Discovery in Databases<br/> ; Conference date: 14-09-2020 Through 18-09-2020",
}