EXPLOITING SIMILARITY INFORMATION IN REINFORCEMENT LEARNING Similarity Models for Multi-Armed Bandits and MDPs

被引:0
|
作者
Ortner, Ronald [1 ]
机构
[1] Univ Leoben, Lehrstuhl Informat Technol, Leoben, Austria
基金
奥地利科学基金会;
关键词
Reinforcement learning; Markov decision process; Multi-armed bandit; Similarity; Regret;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper considers reinforcement learning problems with additional similarity information. We start with the simple setting of multi-armed bandits in which the learner knows for each arm its color, where it is assumed that arms of the same color have close mean rewards. An algorithm is presented that shows that this color information can be used to improve the dependency of online regret bounds on the number of arms. Further, we discuss to what extent this approach can be extended to the more general case of Markov decision processes. For the simplest case where the same color for actions means similar rewards and identical transition probabilities, an algorithm and a corresponding online regret bound are given. For the general case where transition probabilities of same-colored actions imply only close but not necessarily identical transition probabilities we give upper and lower bounds on the error by action aggregation with respect to the color information. These bounds also imply that the general case is far more difficult to handle.
引用
收藏
页码:203 / 210
页数:8
相关论文
共 50 条
  • [31] MULTI-ARMED BANDITS AND THE GITTINS INDEX
    WHITTLE, P
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1980, 42 (02): : 143 - 149
  • [32] Multi-armed bandits with switching penalties
    Asawa, M
    Teneketzis, D
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1996, 41 (03) : 328 - 348
  • [33] Distributed learning dynamics of Multi-Armed Bandits for edge intelligence
    Chen, Shuzhen
    Tao, Youming
    Yu, Dongxiao
    Li, Feng
    Gong, Bei
    JOURNAL OF SYSTEMS ARCHITECTURE, 2021, 114
  • [34] On Optimal Foraging and Multi-armed Bandits
    Srivastava, Vaibhav
    Reverdy, Paul
    Leonard, Naomi E.
    2013 51ST ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2013, : 494 - 499
  • [35] PAC-Bayesian lifelong learning for multi-armed bandits
    Flynn, Hamish
    Reeb, David
    Kandemir, Melih
    Peters, Jan
    DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 36 (02) : 841 - 876
  • [36] Multi-Armed Bandits with Cost Subsidy
    Sinha, Deeksha
    Sankararama, Karthik Abinav
    Kazerouni, Abbas
    Avadhanula, Vashist
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [37] Multi-Armed Bandits With Correlated Arms
    Gupta, Samarth
    Chaudhari, Shreyas
    Joshi, Gauri
    Yagan, Osman
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2021, 67 (10) : 6711 - 6732
  • [38] Batched Multi-armed Bandits Problem
    Gao, Zijun
    Han, Yanjun
    Ren, Zhimei
    Zhou, Zhengqing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [39] Multi-armed bandits: Theory and applications to online learning in networks
    Zhao Q.
    Zhao, Qing, 1600, Morgan and Claypool Publishers (12): : 1 - 165
  • [40] Human-AI Learning Performance in Multi-Armed Bandits
    Pandya, Ravi
    Huang, Sandy H.
    Hadfield-Menell, Dylan
    Dragan, Anca D.
    AIES '19: PROCEEDINGS OF THE 2019 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2019, : 369 - 375