Publication Details
Roxana Radulescu



The prevalence of artificial agents in our world raises the need to ensure that they are able to handle the salient properties of the environment, in order to plan or learn how to solve specific tasks. A first important aspect is the fact that real-world problems are not restricted to one agent, and often involve multiple agents acting in the same environment. Such settings have already proven to be challenging to solve, with a few examples including traffic systems, electricity grids, or warehouse management. Furthermore, the majority of these multi-agent system implementations aim to optimise the agents’ behaviour with respect to a single objective, despite the fact that many problem domains inherently involve multiple objectives. By taking a multi-objective perspective on decision-making problems, complex trade-offs can be managed e.g., supply chain management involves a complex coordination process for optimising the information and material flow between all the components of the supply chain, while minimising overall costs and complying with the conflicting demands of the involved partners (e.g., reducing warehouse holding costs, while maintaining sufficient inventory to fulfil sale demands). In this work, we focus on these highlighted aspects and discuss how the process of decision-making and learning of artificial agents can be formalised and approached when there are multiple agents involved, and there are multiple objectives that need to be considered in the process. To analyse such problems, we adopt a utility-based perspective, and advocate that compromises between competing objectives should be made on the basis of the utility that these compromises have for the users, in other words, it should depend on the desirability of the outcomes. Our analysis of the multi-objective multi-agent decision-making (MOMADM) domain revealed that the field to date has been quite fractured. Consequently, there was not yet a unified view on how to identify and approach these settings. As a first contribution, we develop a novel taxonomy to classify MOMADM settings. This allows us to offer a structured view of the field, to clearly delineate the current state-of-the-art in multi-objective multi-agent decision making approaches and to identify promising directions for future research. During the learning process in multi-objective multi-agent systems, agents receive a list of values, with each component representing the performance on a different objective. In the case of self-interested agents (i.e., each with a possibly different preference over the objectives), finding trade-offs between conflicting interests becomes far from trivial. As a second contribution, we proceed to analyse and investigate game theoretic equilibria under different multi-objective optimisation criteria and provide theoretical results concerning the existence and conditions for arriving to such solutions in these scenarios. We additionally show that it is possible for Nash equilibria to not exist in certain multi-objective multi-agent settings. When each participant in the decision-making process has a different utility, it becomes essential for agents to learn about the behaviour of others. As a final contribution, we present the first study of the effects of opponent modelling on multi-objective multi-agent interactions. We contribute novel learning algorithms, along with extensions that incorporate opponent behaviour modelling and learning with opponent learning awareness (i.e., learning while anticipating one's impact on the opponent's learning step). Empirical results demonstrate that opponent learning awareness and modelling can drastically alter the learning dynamics. When Nash equilibria are present, opponent modelling can confer significant benefits on agents that implement it. When there are no Nash equilibria, opponent learning awareness and modelling allows agents to still converge to meaningful solutions.

Link VUB