A Bayesian reinforcement learning approach in markov games for computing near-optimal policies

被引:0
|
作者
Julio B. Clempner
机构
[1] Instituto Politécnico Nacional (National Polytechnic Institute),Escuela Superior de Física y Matemáticas (School of Physics and Mathematics
[2] Building 9,undefined
[3] Av. Instituto Politécnico Nacional,undefined
关键词
Reinforcement learning; Bayesian inference; Markov games with private information; Bayesian equilibrium; 91A10; 91A40; 91A26; 62C10; 60J20;
D O I
暂无
中图分类号
学科分类号
摘要
Bayesian Learning is an inference method designed to tackle exploration-exploitation trade-off as a function of the uncertainty of a given probability model from observations within the Reinforcement Learning (RL) paradigm. It allows the incorporation of prior knowledge, as probabilistic distributions, into the algorithms. Finding the resulting Bayes-optimal policies is notorious problem. We focus our attention on RL of a special kind of ergodic and controllable Markov games. We propose a new framework for computing the near-optimal policies for each agent, where it is assumed that the Markov chains are regular and the inverse of the behavior strategy is well defined. A fundamental result of this paper is the development of a theoretical method that, based on the formulation of a non-linear problem, computes the near-optimal adaptive-behavior strategies and policies of the game under some restrictions that maximize the expected reward. We prove that such behavior strategies and the policies satisfy the Bayesian-Nash equilibrium. Another important result is that the RL process learn a model through the interaction of the agents with the environment, and shows how the proposed method can finitely approximate and estimate the elements of the transition matrices and utilities maintaining an efficient long-term learning performance measure. We develop the algorithm for implementing this model. A numerical empirical example shows how to deploy the estimation process as a function of agent experiences.
引用
收藏
页码:675 / 690
页数:15
相关论文
共 50 条
  • [1] A Bayesian reinforcement learning approach in markov games for computing near-optimal policies
    Clempner, Julio B.
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2023, 91 (05) : 675 - 690
  • [2] Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies
    Asiain, Erick
    Clempner, Julio B.
    Poznyak, Alexander S.
    SOFT COMPUTING, 2019, 23 (11) : 3591 - 3604
  • [3] Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies
    Erick Asiain
    Julio B. Clempner
    Alexander S. Poznyak
    Soft Computing, 2019, 23 : 3591 - 3604
  • [4] Polynomial-time reinforcement learning of near-optimal policies
    Pivazyan, K
    Shoham, Y
    EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 205 - 210
  • [5] Computing Near-Optimal Policies in Generalized Joint Replenishment
    Adelman, Daniel
    Klabjan, Diego
    INFORMS JOURNAL ON COMPUTING, 2012, 24 (01) : 148 - 164
  • [6] Near-Optimal Reinforcement Learning in Polynomial Time
    Michael Kearns
    Satinder Singh
    Machine Learning, 2002, 49 : 209 - 232
  • [7] Near-optimal reinforcement learning in polynomial time
    Kearns, M
    Singh, S
    MACHINE LEARNING, 2002, 49 (2-3) : 209 - 232
  • [8] Near-optimal Regret Bounds for Reinforcement Learning
    Jaksch, Thomas
    Ortner, Ronald
    Auer, Peter
    JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 1563 - 1600
  • [9] Near-optimal regret bounds for reinforcement learning
    Jaksch, Thomas
    Ortner, Ronald
    Auer, Peter
    Journal of Machine Learning Research, 2010, 11 : 1563 - 1600
  • [10] Reinforcement Learning for Near-Optimal Design of Zero-Delay Codes for Markov Sources
    Cregg, Liam
    Linder, Tamas
    Yuksel, Serdar
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (11) : 8399 - 8413