A Bayesian reinforcement learning approach in markov games for computing near-optimal policies

被引:0
|
作者
Julio B. Clempner
机构
[1] Instituto Politécnico Nacional (National Polytechnic Institute),Escuela Superior de Física y Matemáticas (School of Physics and Mathematics
[2] Building 9,undefined
[3] Av. Instituto Politécnico Nacional,undefined
关键词
Reinforcement learning; Bayesian inference; Markov games with private information; Bayesian equilibrium; 91A10; 91A40; 91A26; 62C10; 60J20;
D O I
暂无
中图分类号
学科分类号
摘要
Bayesian Learning is an inference method designed to tackle exploration-exploitation trade-off as a function of the uncertainty of a given probability model from observations within the Reinforcement Learning (RL) paradigm. It allows the incorporation of prior knowledge, as probabilistic distributions, into the algorithms. Finding the resulting Bayes-optimal policies is notorious problem. We focus our attention on RL of a special kind of ergodic and controllable Markov games. We propose a new framework for computing the near-optimal policies for each agent, where it is assumed that the Markov chains are regular and the inverse of the behavior strategy is well defined. A fundamental result of this paper is the development of a theoretical method that, based on the formulation of a non-linear problem, computes the near-optimal adaptive-behavior strategies and policies of the game under some restrictions that maximize the expected reward. We prove that such behavior strategies and the policies satisfy the Bayesian-Nash equilibrium. Another important result is that the RL process learn a model through the interaction of the agents with the environment, and shows how the proposed method can finitely approximate and estimate the elements of the transition matrices and utilities maintaining an efficient long-term learning performance measure. We develop the algorithm for implementing this model. A numerical empirical example shows how to deploy the estimation process as a function of agent experiences.
引用
收藏
页码:675 / 690
页数:15
相关论文
共 50 条
  • [31] Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning
    Zhang, Zihan
    Jiang, Yuhang
    Zhou, Yuan
    Ji, Xiangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [32] Near-Optimal Offline Reinforcement Learning via Double Variance Reduction
    Yin, Ming
    Bai, Yu
    Wang, Yu-Xiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [33] A direct approach for computing near-optimal low-thrust transfers
    Kluever, CA
    Oleson, SR
    ASTRODYNAMICS 1997, 1998, 97 : 1783 - 1800
  • [34] Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
    Roy, Arghyadip
    Borkar, Vivek
    Karandikar, Abhay
    Chaporkar, Prasanna
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (07) : 3722 - 3729
  • [35] A near-optimal poly-time algorithm for learning in a class of stochastic games
    Brafman, RI
    Tennenholtz, M
    IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2, 1999, : 734 - 739
  • [36] A near-optimal polynomial time algorithm for learning in certain classes of stochastic games
    Brafman, RI
    Tennenholtz, M
    ARTIFICIAL INTELLIGENCE, 2000, 121 (1-2) : 31 - 47
  • [37] Near-optimal reinforcement learning framework for energy-aware sensor communications
    Pandana, C
    Liu, KJR
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2005, 23 (04) : 788 - 797
  • [38] Caching in Dynamic Environments: A Near-Optimal Online Learning Approach
    Zhou, Shiji
    Wang, Zhi
    Hu, Chenghao
    Mao, Yinan
    Yan, Haopeng
    Zhang, Shanghang
    Wu, Chuan
    Zhu, Wenwu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 792 - 804
  • [39] Near-Optimal Provable Uniform Convergence in Offine Policy Evaluation for Reinforcement Learning
    Yin, Ming
    Bai, Yu
    Wang, Yu-Xiang
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [40] Near-Optimal Vehicular Crowdsensing Task Allocation Empowered by Deep Reinforcement Learning
    Xiang C.-C.
    Li Y.-Y.
    Feng L.
    Chen C.
    Guo S.-T.
    Yang P.-L.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (05): : 918 - 934