EXPLOITING SIMILARITY INFORMATION IN REINFORCEMENT LEARNING Similarity Models for Multi-Armed Bandits and MDPs

被引：0

作者：

Ortner, Ronald ^{[1
]}

机构：

[1] Univ Leoben, Lehrstuhl Informat Technol, Leoben, Austria

来源：

ICAART 2010: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1: ARTIFICIAL INTELLIGENCE | 2010年

基金：

奥地利科学基金会;

关键词：

Reinforcement learning; Markov decision process; Multi-armed bandit; Similarity; Regret;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper considers reinforcement learning problems with additional similarity information. We start with the simple setting of multi-armed bandits in which the learner knows for each arm its color, where it is assumed that arms of the same color have close mean rewards. An algorithm is presented that shows that this color information can be used to improve the dependency of online regret bounds on the number of arms. Further, we discuss to what extent this approach can be extended to the more general case of Markov decision processes. For the simplest case where the same color for actions means similar rewards and identical transition probabilities, an algorithm and a corresponding online regret bound are given. For the general case where transition probabilities of same-colored actions imply only close but not necessarily identical transition probabilities we give upper and lower bounds on the error by action aggregation with respect to the color information. These bounds also imply that the general case is far more difficult to handle.

引用

页码：203 / 210

页数：8

共 50 条

[21] Multi-armed bandits for performance marketing
Gigli, Marco
Stella, Fabio
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
[22] Lenient Regret for Multi-Armed Bandits
Merlis, Nadav
Mannor, Shie
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8950 - 8957
[23] Finding structure in multi-armed bandits
Schulz, Eric
Franklin, Nicholas T.
Gershman, Samuel J.
COGNITIVE PSYCHOLOGY, 2020, 119
[24] ON MULTI-ARMED BANDITS AND DEBT COLLECTION
Czekaj, Lukasz
Biegus, Tomasz
Kitlowski, Robert
Tomasik, Pawel
36TH ANNUAL EUROPEAN SIMULATION AND MODELLING CONFERENCE, ESM 2022, 2022, : 137 - 141
[25] Visualizations for interrogations of multi-armed bandits
Keaton, Timothy J.
Sabbaghi, Arman
STAT, 2019, 8 (01):
[26] Multi-armed bandits with dependent arms
Singh, Rahul
Liu, Fang
Sun, Yin
Shroff, Ness
MACHINE LEARNING, 2024, 113 (01) : 45 - 71
[27] On Kernelized Multi-Armed Bandits with Constraints
Zhou, Xingyu
Ji, Bo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[28] Multi-Armed Bandits in Metric Spaces
Kleinberg, Robert
Slivkins, Aleksandrs
Upfal, Eli
STOC'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL SYMPOSIUM ON THEORY OF COMPUTING, 2008, : 681 - +
[29] Multi-Armed Bandits With Costly Probes
Elumar, Eray Can
Tekin, Cem
Yagan, Osman
IEEE TRANSACTIONS ON INFORMATION THEORY, 2025, 71 (01) : 618 - 643
[30] Multi-armed bandits with episode context
Christopher D. Rosin
Annals of Mathematics and Artificial Intelligence, 2011, 61 : 203 - 230

← 1 2 3 4 5 →