Willem Röpke, Raphaël Avalos, Roxana Radulescu, Ann Nowe, Diederik M. Roijers, Florent Delgrange
We introduce Optimal Transport MDPs (OT-MDPs), a framework for learning principled latent world models via optimal transport. Our approach formulates a generic optimal transport objective that trains a generative model of the environment by minimising a customisable cost function, which quantifies the discrepancy between latent and real trajectories. Through this perspective, we highlight the limitations of reconstruction-based methods and establish conditions on the cost function that enable theoretical guarantees. The quality of the learned model allows us to integrate reinforcement learning and planning methods. In particular, we leverage model-based value expansion to refine value estimates, providing rigorous theoretical justification. Additionally, we examine the use of Monte Carlo tree search and provide a theoretical analysis of the assumptions under which its application remains sound. Empirical evaluation across four MinAtar environments demonstrates that OT-MDPs yield high-fidelity models, leading to strong performance. Moreover, our results reveal challenges associated with planning in the latent model, suggesting critical directions for future research.
Röpke, W, Avalos, R, Radulescu, R, Nowe, A, Roijers, DM & Delgrange, F 2025, 'Integrating RL and Planning through Optimal Transport World Models', Paper presented at Adaptive and Learning Agents Workshop
at AAMAS 2025, Detroit, United States, 19/05/25 - 20/05/25. <https://ala-workshop.github.io/papers/ALA2025_paper_39.pdf>
Röpke, W., Avalos, R., Radulescu, R., Nowe, A., Roijers, D. M., & Delgrange, F. (2025). Integrating RL and Planning through Optimal Transport World Models. Paper presented at Adaptive and Learning Agents Workshop
at AAMAS 2025, Detroit, Michigan, United States. https://ala-workshop.github.io/papers/ALA2025_paper_39.pdf
@conference{f053b121cdb54ff081a74af1019b4b15,
title = "Integrating RL and Planning through Optimal Transport World Models",
abstract = "We introduce Optimal Transport MDPs (OT-MDPs), a framework for learning principled latent world models via optimal transport. Our approach formulates a generic optimal transport objective that trains a generative model of the environment by minimising a customisable cost function, which quantifies the discrepancy between latent and real trajectories. Through this perspective, we highlight the limitations of reconstruction-based methods and establish conditions on the cost function that enable theoretical guarantees. The quality of the learned model allows us to integrate reinforcement learning and planning methods. In particular, we leverage model-based value expansion to refine value estimates, providing rigorous theoretical justification. Additionally, we examine the use of Monte Carlo tree search and provide a theoretical analysis of the assumptions under which its application remains sound. Empirical evaluation across four MinAtar environments demonstrates that OT-MDPs yield high-fidelity models, leading to strong performance. Moreover, our results reveal challenges associated with planning in the latent model, suggesting critical directions for future research.",
keywords = "Reinforcement learning, Optimal transport, Representation Learning",
author = "Willem R{\"o}pke and Rapha{\"e}l Avalos and Roxana Radulescu and Ann Nowe and Roijers, {Diederik M.} and Florent Delgrange",
year = "2025",
month = may,
day = "19",
language = "English",
note = "Adaptive and Learning Agents Workshop<br/>at AAMAS 2025, ALA 2025 ; Conference date: 19-05-2025 Through 20-05-2025",
url = "https://ala-workshop.github.io",
}