A Bayesian reinforcement learning approach in markov games for computing near-optimal policies

被引:0
|
作者
Julio B. Clempner
机构
[1] Instituto Politécnico Nacional (National Polytechnic Institute),Escuela Superior de Física y Matemáticas (School of Physics and Mathematics
[2] Building 9,undefined
[3] Av. Instituto Politécnico Nacional,undefined
关键词
Reinforcement learning; Bayesian inference; Markov games with private information; Bayesian equilibrium; 91A10; 91A40; 91A26; 62C10; 60J20;
D O I
暂无
中图分类号
学科分类号
摘要
Bayesian Learning is an inference method designed to tackle exploration-exploitation trade-off as a function of the uncertainty of a given probability model from observations within the Reinforcement Learning (RL) paradigm. It allows the incorporation of prior knowledge, as probabilistic distributions, into the algorithms. Finding the resulting Bayes-optimal policies is notorious problem. We focus our attention on RL of a special kind of ergodic and controllable Markov games. We propose a new framework for computing the near-optimal policies for each agent, where it is assumed that the Markov chains are regular and the inverse of the behavior strategy is well defined. A fundamental result of this paper is the development of a theoretical method that, based on the formulation of a non-linear problem, computes the near-optimal adaptive-behavior strategies and policies of the game under some restrictions that maximize the expected reward. We prove that such behavior strategies and the policies satisfy the Bayesian-Nash equilibrium. Another important result is that the RL process learn a model through the interaction of the agents with the environment, and shows how the proposed method can finitely approximate and estimate the elements of the transition matrices and utilities maintaining an efficient long-term learning performance measure. We develop the algorithm for implementing this model. A numerical empirical example shows how to deploy the estimation process as a function of agent experiences.
引用
收藏
页码:675 / 690
页数:15
相关论文
共 50 条
  • [21] Near-optimal Bayesian active learning with correlated and noisy tests
    Chen, Yuxin
    Hassani, S. Hamed
    Krause, Andreas
    ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (02): : 4969 - 5017
  • [22] Near-optimal Bayesian Active Learning with Correlated and Noisy Tests
    Chen, Yuxin
    Hassani, S. Hamed
    Krause, Andreas
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 223 - 231
  • [23] Computing Near-Optimal Stable Cost Allocations for Cooperative Games by Lagrangian Relaxation
    Liu, Lindong
    Qi, Xiangtong
    Xu, Zhou
    INFORMS JOURNAL ON COMPUTING, 2016, 28 (04) : 687 - 702
  • [24] A reinforcement learning-based near-optimal hierarchical approach for motion control: Design and experiment
    Qin, Zhi-Chang
    Zhu, Hai-Tao
    Wang, Shou-Jun
    Xin, Ying
    Sun, Jian-Qiao
    ISA TRANSACTIONS, 2022, 129 : 673 - 683
  • [25] REPLACEMENT POLICIES - A NEAR-OPTIMAL ALGORITHM
    JAYABALAN, V
    CHAUDHURI, D
    IIE TRANSACTIONS, 1995, 27 (06) : 784 - 788
  • [26] Tractable near-optimal policies for crawling
    Azar, Yossi
    Horvitz, Eric
    Lubetzky, Eyal
    Peres, Yuval
    Shahaf, Dafna
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2018, 115 (32) : 8099 - 8103
  • [27] Deriving a Near-optimal Power Management Policy Using Model-Free Reinforcement Learning and Bayesian Classification
    Wang, Yanzhi
    Xie, Qing
    Ammari, Ahmed
    Pedram, Massoud
    PROCEEDINGS OF THE 48TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2011, : 41 - 46
  • [28] Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games
    Cai, Yang
    Luo, Haipeng
    Wei, Chen-Yu
    Zheng, Weiqiang
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [29] Near-Optimal Learning of Extensive-Form Games with Imperfect Information
    Bai, Yu
    Jin, Chi
    Mei, Song
    Yu, Tiancheng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [30] Near-Optimal No-Regret Learning Dynamics for General Convex Games
    Farina, Gabriele
    Anagnostides, Ioannis
    Luo, Haipeng
    Lee, Chung-Wei
    Kroer, Christian
    Sandholm, Tuomas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,