EXPLOITING SIMILARITY INFORMATION IN REINFORCEMENT LEARNING Similarity Models for Multi-Armed Bandits and MDPs

被引：0

作者：

Ortner, Ronald ^{[1
]}

机构：

[1] Univ Leoben, Lehrstuhl Informat Technol, Leoben, Austria

来源：

ICAART 2010: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1: ARTIFICIAL INTELLIGENCE | 2010年

基金：

奥地利科学基金会;

关键词：

Reinforcement learning; Markov decision process; Multi-armed bandit; Similarity; Regret;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper considers reinforcement learning problems with additional similarity information. We start with the simple setting of multi-armed bandits in which the learner knows for each arm its color, where it is assumed that arms of the same color have close mean rewards. An algorithm is presented that shows that this color information can be used to improve the dependency of online regret bounds on the number of arms. Further, we discuss to what extent this approach can be extended to the more general case of Markov decision processes. For the simplest case where the same color for actions means similar rewards and identical transition probabilities, an algorithm and a corresponding online regret bound are given. For the general case where transition probabilities of same-colored actions imply only close but not necessarily identical transition probabilities we give upper and lower bounds on the error by action aggregation with respect to the color information. These bounds also imply that the general case is far more difficult to handle.

引用

页码：203 / 210

页数：8

共 50 条

[1] Quantum Reinforcement Learning for Multi-Armed Bandits
Liu, Yi-Pei
Li, Kuo
Cao, Xi
Jia, Qing-Shan
Wang, Xu
2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 5675 - 5680
[2] Active Learning in Multi-armed Bandits
Antos, Andras
Grover, Varun
Szepesvari, Csaba
ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 287 - +
[3] Interactive Multi-objective Reinforcement Learning in Multi-armed Bandits with Gaussian Process Utility Models
Roijers, Diederik M.
Zintgraf, Luisa M.
Libin, Pieter
Reymond, Mathieu
Bargiacchi, Eugenio
Nowe, Ann
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT III, 2021, 12459 : 463 - 478
[4] TRANSFER LEARNING FOR CONTEXTUAL MULTI-ARMED BANDITS
Cai, Changxiao
Cai, T. Tony
Li, Hongzhe
ANNALS OF STATISTICS, 2024, 52 (01): : 207 - 232
[5] Maximizing and Satisficing in Multi-armed Bandits with Graph Information
Thaker, Parth K.
Malu, Mohit
Rao, Nikhil
Dasarathy, Gautam
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[6] Multi-Armed Bandits With Self-Information Rewards
Weinberger, Nir
Yemini, Michal
IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (11) : 7160 - 7184
[7] On Kernelized Multi-armed Bandits
Chowdhury, Sayak Ray
Gopalan, Aditya
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[8] Multi-armed Bandits with Compensation
Wang, Siwei
Huang, Longbo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[9] Regional Multi-Armed Bandits
Wang, Zhiyang
Zhou, Ruida
Shen, Cong
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
[10] Federated Multi-Armed Bandits
Shi, Chengshuai
Shen, Cong
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9603 - 9611

← 1 2 3 4 5 →