ETROVUB

Raphaël Avalos

Thesis

Abstract ■

Many decision making problems in robotics, autonomous systems, and multi-agent coordination involve partial observability, where agents must act based on incomplete and noisy information about the environment. This challenge often arises from the physical limitations of sensors, which may capture only certain aspects of the underlying state, sometimes unreliably. As a result, the agent must estimate the current situation, predict future outcomes, and choose actions under uncertainty. While most approaches assume the state is never available, in practice it can sometimes be accessed in two important ways: during training, for example through a simulator that provides ground-truth states, or at a cost during execution by activating high-precision auxiliary sensors. Such access makes state information a valuable but limited resource. This thesis investigates how strategic use of this resource can improve learning and planning under uncertainty. In reinforcement learning, we show how access to state information during training can supervise internal representations, stabilize learning, and improve performance at deployment without state access. In planning, we formalize and solve decision problems where the agent can request the state during execution, weighing the benefit of information against its cost. We propose three methods. First, we introduce Local Advantage Networks (LAN) for cooperative multi-agent reinforcement learning, which stabilizes the training of independent agents through a centralized value baseline without requiring joint action-value factorization, achieving state-of-the-art performance among value-based methods on the SMAC benchmark. Second, we present AEMS-SR, a graph-based online planning algorithm for POMDPs with state requests, which extends Anytime Error Minimization Search to handle costly state queries while retaining ε-optimality guarantees and outperforming standard planners in test domains. Third, we develop the Wasserstein Belief Updater (WBU) for model-based reinforcement learning, which leverages state access during training to learn latent belief updates. WBU provides theoretical guarantees on belief quality via bisimulation distances and achieves strong empirical performance in partially observable environments. Across these settings, we treat state access as a structured and limited resource. By showing when and how to exploit it in both learning and planning, this thesis demonstrates that agents can achieve more robust and effective decision making under partial observability.

Reference ■

Avalos, R 2026, 'Leveraging state access in partially observable sequential decision-making', Vrije Universiteit Brussel.

Avalos, R. (2026). Leveraging state access in partially observable sequential decision-making. [PhD Thesis, Vrije Universiteit Brussel]. Crazy Copy Center Productions.

@phdthesis{fe93e4c069034e6eb90b95033a3e6a19,
title = "Leveraging state access in partially observable sequential decision-making",
abstract = "Many decision making problems in robotics, autonomous systems, and multi-agent coordination involve partial observability, where agents must act based on incomplete and noisy information about the environment. This challenge often arises from the physical limitations of sensors, which may capture only certain aspects of the underlying state, sometimes unreliably. As a result, the agent must estimate the current situation, predict future outcomes, and choose actions under uncertainty. While most approaches assume the state is never available, in practice it can sometimes be accessed in two important ways: during training, for example through a simulator that provides ground-truth states, or at a cost during execution by activating high-precision auxiliary sensors. Such access makes state information a valuable but limited resource. This thesis investigates how strategic use of this resource can improve learning and planning under uncertainty. In reinforcement learning, we show how access to state information during training can supervise internal representations, stabilize learning, and improve performance at deployment without state access. In planning, we formalize and solve decision problems where the agent can request the state during execution, weighing the benefit of information against its cost. We propose three methods. First, we introduce Local Advantage Networks (LAN) for cooperative multi-agent reinforcement learning, which stabilizes the training of independent agents through a centralized value baseline without requiring joint action-value factorization, achieving state-of-the-art performance among value-based methods on the SMAC benchmark. Second, we present AEMS-SR, a graph-based online planning algorithm for POMDPs with state requests, which extends Anytime Error Minimization Search to handle costly state queries while retaining ε-optimality guarantees and outperforming standard planners in test domains. Third, we develop the Wasserstein Belief Updater (WBU) for model-based reinforcement learning, which leverages state access during training to learn latent belief updates. WBU provides theoretical guarantees on belief quality via bisimulation distances and achieves strong empirical performance in partially observable environments. Across these settings, we treat state access as a structured and limited resource. By showing when and how to exploit it in both learning and planning, this thesis demonstrates that agents can achieve more robust and effective decision making under partial observability.",
author = "Rapha{\"e}l Avalos",
year = "2026",
language = "English",
isbn = "9789493461413",
publisher = "Crazy Copy Center Productions",
address = "Belgium",
school = "Vrije Universiteit Brussel",
}