EXPLOITING SIMILARITY INFORMATION IN REINFORCEMENT LEARNING Similarity Models for Multi-Armed Bandits and MDPs

被引:0
|
作者
Ortner, Ronald [1 ]
机构
[1] Univ Leoben, Lehrstuhl Informat Technol, Leoben, Austria
基金
奥地利科学基金会;
关键词
Reinforcement learning; Markov decision process; Multi-armed bandit; Similarity; Regret;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper considers reinforcement learning problems with additional similarity information. We start with the simple setting of multi-armed bandits in which the learner knows for each arm its color, where it is assumed that arms of the same color have close mean rewards. An algorithm is presented that shows that this color information can be used to improve the dependency of online regret bounds on the number of arms. Further, we discuss to what extent this approach can be extended to the more general case of Markov decision processes. For the simplest case where the same color for actions means similar rewards and identical transition probabilities, an algorithm and a corresponding online regret bound are given. For the general case where transition probabilities of same-colored actions imply only close but not necessarily identical transition probabilities we give upper and lower bounds on the error by action aggregation with respect to the color information. These bounds also imply that the general case is far more difficult to handle.
引用
收藏
页码:203 / 210
页数:8
相关论文
共 50 条
  • [1] Quantum Reinforcement Learning for Multi-Armed Bandits
    Liu, Yi-Pei
    Li, Kuo
    Cao, Xi
    Jia, Qing-Shan
    Wang, Xu
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 5675 - 5680
  • [2] Active Learning in Multi-armed Bandits
    Antos, Andras
    Grover, Varun
    Szepesvari, Csaba
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 287 - +
  • [3] Interactive Multi-objective Reinforcement Learning in Multi-armed Bandits with Gaussian Process Utility Models
    Roijers, Diederik M.
    Zintgraf, Luisa M.
    Libin, Pieter
    Reymond, Mathieu
    Bargiacchi, Eugenio
    Nowe, Ann
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT III, 2021, 12459 : 463 - 478
  • [4] TRANSFER LEARNING FOR CONTEXTUAL MULTI-ARMED BANDITS
    Cai, Changxiao
    Cai, T. Tony
    Li, Hongzhe
    ANNALS OF STATISTICS, 2024, 52 (01): : 207 - 232
  • [5] Maximizing and Satisficing in Multi-armed Bandits with Graph Information
    Thaker, Parth K.
    Malu, Mohit
    Rao, Nikhil
    Dasarathy, Gautam
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [6] Multi-Armed Bandits With Self-Information Rewards
    Weinberger, Nir
    Yemini, Michal
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (11) : 7160 - 7184
  • [7] On Kernelized Multi-armed Bandits
    Chowdhury, Sayak Ray
    Gopalan, Aditya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [8] Multi-armed Bandits with Compensation
    Wang, Siwei
    Huang, Longbo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [9] Regional Multi-Armed Bandits
    Wang, Zhiyang
    Zhou, Ruida
    Shen, Cong
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [10] Federated Multi-Armed Bandits
    Shi, Chengshuai
    Shen, Cong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9603 - 9611