Bayesian Opponent-Aware Soft Q-Learning

M. Chacón Falcón, D. Rios Insua

Adapting to opponents with unknown, non-stationary objectives is a core challenge in multi-agent reinforcement learning. To address this problem, we introduce Bayesian Opponent-Aware Soft Q-Learning, a novel framework bridging Adversarial Risk Analysis (ARA) and Maximum Entropy RL. Our approach augments the RL objective with an information-seeking bonus, yielding a convergent opponent-aware soft Q-operator. Additionally, we drop the assumption of observable opponent rewards when modeling other agents, and instead treat them as learnable continuous latent variables. Combined with a learned transition model, this approach enables safe policy adaptation via offline imagination. Empirical results demonstrate our framework infers hidden rewards and detects shifts in opponent strategies, enabling robust exploitation in uncertain Markov games.

Keywords: Opponent modeling Soft Q-Learning Sequential Monte Carlo

Scheduled

GT Inferencia Bayesiana: Sesión de Jóvenes Bayesianos en honor a Mª Eugenia Castellanos

September 5, 2026 10:00 AM

Aula 20

Other papers in the same session

A Unified Bayesian Framework for Adversarial Robustness

P. García Arce, R. Naveiro, D. Ríos Insua

MissingBVS: an R package for implementing Bayesian Variable Selection in the presence of missing data

C. Mulet, G. García-Donato

Bayesian Online Test Time Adaptation

D. Corrales Alonso, D. Ríos Insua

Bayesian Opponent-Aware Soft Q-Learning

Other papers in the same session

Cookie policy