Bayesian Opponent-Aware Soft Q-Learning
M. Chacón Falcón, D. Rios Insua
Adapting to opponents with unknown, non-stationary objectives is a core challenge in multi-agent reinforcement learning. To address this problem, we introduce Bayesian Opponent-Aware Soft Q-Learning, a novel framework bridging Adversarial Risk Analysis (ARA) and Maximum Entropy RL. Our approach augments the RL objective with an information-seeking bonus, yielding a convergent opponent-aware soft Q-operator. Additionally, we drop the assumption of observable opponent rewards when modeling other agents, and instead treat them as learnable continuous latent variables. Combined with a learned transition model, this approach enables safe policy adaptation via offline imagination. Empirical results demonstrate our framework infers hidden rewards and detects shifts in opponent strategies, enabling robust exploitation in uncertain Markov games.
Palabras clave: Opponent modeling, Soft Q-Learning, Sequential Monte Carlo
Programado
GT Inferencia Bayesiana: Sesión de Jóvenes Bayesianos en honor a Mª Eugenia Castellanos
5 de septiembre de 2026 10:00
Aula 20
Otros trabajos en la misma sesión
P. García Arce, R. Naveiro, D. Ríos Insua
C. Mulet, G. García-Donato
D. Corrales Alonso, D. Ríos Insua