Risk-Averse Allocation Indices for Multiarmed Bandit Problem

被引：4

作者：

Malekipirbazari, Milad ^{[1
]}

Cavus, Ozlem ^{[1
]}

机构：

[1] Bilkent Univ, Dept Ind Engn, TR-06800 Ankara, Turkey

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2021年 / 66卷 / 11期

关键词：

Markov processes; Indexes; Resource management; Heuristic algorithms; Dynamic scheduling; Routing; Random variables; Coherent risk measures; dynamic allocation index; dynamic risk-aversion; Gittins index; multiarmed bandit (MAB);

D O I：

10.1109/TAC.2021.3053539

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In classical multiarmed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision-maker is risk-neutral. On the other hand, the decision-makers are risk-averse in some real-life applications. In this article, we design a new setting based on the concept of dynamic risk measures where the aim is to find a policy with the best risk-adjusted total discounted outcome. We provide a theoretical analysis of multiarmed bandit problem with respect to this novel setting and propose a priority-index heuristic which gives risk-averse allocation indices having a structure similar to Gittins index. Although an optimal policy is shown not always to have index-based form, empirical results express the excellence of this heuristic and show that with risk-averse allocation indices we can achieve optimal or near-optimal interpretable policies.

引用

页码：5522 / 5529

页数：8

共 50 条

[1] Risk-Averse Stochastic Convex Bandit
Cardoso, Adrian Rivera
Xu, Huan
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89 : 39 - 47
[2] ADAPTIVE TREATMENT ALLOCATION AND THE MULTIARMED BANDIT PROBLEM
LAI, TL
ANNALS OF STATISTICS, 1987, 15 (03): : 1091 - 1114
[3] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
Lin, Yifan
Wang, Yuhao
Zhou, Enlu
JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2022,
[4] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
Lin, Yifan
Wang, Yuhao
Zhou, Enlu
JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2023, 32 (03) : 267 - 288
[5] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
Yifan Lin
Yuhao Wang
Enlu Zhou
Journal of Systems Science and Systems Engineering, 2023, 32 : 267 - 288
[6] Risk-Averse Trees for Learning from Logged Bandit Feedback
Trovo, Francesco
Paladino, Stefano
Simone, Paolo
Restelli, Marcello
Gatti, Nicola
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 976 - 983
[7] The risk-averse ultimate pit problem
Canessa, Gianpiero
Moreno, Eduardo
Pagnoncelli, Bernardo K.
OPTIMIZATION AND ENGINEERING, 2021, 22 (04) : 2655 - 2678
[8] The nonstochastic multiarmed bandit problem
Auer, P
Cesa-Bianchi, N
Freund, Y
Schapire, RE
SIAM JOURNAL ON COMPUTING, 2003, 32 (01) : 48 - 77
[9] The Irrevocable Multiarmed Bandit Problem
Farias, Vivek F.
Madan, Ritesh
OPERATIONS RESEARCH, 2011, 59 (02) : 383 - 399
[10] The risk-averse ultimate pit problem
Gianpiero Canessa
Eduardo Moreno
Bernardo K. Pagnoncelli
Optimization and Engineering, 2021, 22 : 2655 - 2678

← 1 2 3 4 5 →