M. Chacón Falcón, D. Rios Insua

Adapting to opponents with unknown, non-stationary objectives is a core challenge in multi-agent reinforcement learning. To address this problem, we introduce Bayesian Opponent-Aware Soft Q-Learning, a novel framework bridging Adversarial Risk Analysis (ARA) and Maximum Entropy RL. Our approach augments the RL objective with an information-seeking bonus, yielding a convergent opponent-aware soft Q-operator. Additionally, we drop the assumption of observable opponent rewards when modeling other agents, and instead treat them as learnable continuous latent variables. Combined with a learned transition model, this approach enables safe policy adaptation via offline imagination. Empirical results demonstrate our framework infers hidden rewards and detects shifts in opponent strategies, enabling robust exploitation in uncertain Markov games.

Keywords: Opponent modeling, Soft Q-Learning, Sequential Monte Carlo

Scheduled

GT Inferencia Bayesiana: Sesión de Jóvenes Bayesianos en honor a Mª Eugenia Castellanos
September 5, 2026  10:00 AM
Aula 20


Other papers in the same session


Cookie policy

We use cookies in order to be able to identify and authenticate you on the website. They are necessary for the correct functioning of it, and therefore they can not be disabled. If you continue browsing the website, you are agreeing with their acceptance, as well as our Privacy Policy.

Additionally, we use Google Analytics in order to analyze the website traffic. They also use cookies and you can accept or refuse them with the buttons below.

You can read more details about our Cookie Policy and our Privacy Policy.