Risk-Averse Allocation Indices for Multiarmed Bandit Problem

被引：4

作者：

Malekipirbazari, Milad ^{[1
]}

Cavus, Ozlem ^{[1
]}

机构：

[1] Bilkent Univ, Dept Ind Engn, TR-06800 Ankara, Turkey

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2021年 / 66卷 / 11期

关键词：

Markov processes; Indexes; Resource management; Heuristic algorithms; Dynamic scheduling; Routing; Random variables; Coherent risk measures; dynamic allocation index; dynamic risk-aversion; Gittins index; multiarmed bandit (MAB);

D O I：

10.1109/TAC.2021.3053539

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In classical multiarmed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision-maker is risk-neutral. On the other hand, the decision-makers are risk-averse in some real-life applications. In this article, we design a new setting based on the concept of dynamic risk measures where the aim is to find a policy with the best risk-adjusted total discounted outcome. We provide a theoretical analysis of multiarmed bandit problem with respect to this novel setting and propose a priority-index heuristic which gives risk-averse allocation indices having a structure similar to Gittins index. Although an optimal policy is shown not always to have index-based form, empirical results express the excellence of this heuristic and show that with risk-averse allocation indices we can achieve optimal or near-optimal interpretable policies.

引用

页码：5522 / 5529

页数：8

共 50 条

[41] A risk-averse competitive newsvendor problem under the CVaR criterion
Wu, Meng
Zhu, Stuart X.
Teunter, Ruud H.
INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 2014, 156 : 13 - 23
[42] Risk-Averse Biased Human Policies with a Robot Assistant in Multi-Armed Bandit Settings
Koller, Michael
Patten, Timothy
Vincze, Markus
THE 14TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2021, 2021, : 483 - 488
[43] I am not risk-averse
Leach, RE
AMERICAN JOURNAL OF SPORTS MEDICINE, 2000, 28 (06): : 777 - 777
[44] Risk-averse firms in oligopoly
Asplund, M
INTERNATIONAL JOURNAL OF INDUSTRIAL ORGANIZATION, 2002, 20 (07) : 995 - 1012
[45] Risk-Averse Multi-Armed Bandit Problems Under Mean-Variance Measure
Vakili, Sattar
Zhao, Qing
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2016, 10 (06) : 1093 - 1111
[46] Time to get risk-averse
Webzell, Steed
Operations Engineer, 2023, 2023 (01): : 14 - 15
[47] Finite-time Analysis of the Multiarmed Bandit Problem
Peter Auer
Nicolò Cesa-Bianchi
Paul Fischer
Machine Learning, 2002, 47 : 235 - 256
[48] SALES AND RISK-AVERSE CONSUMERS
GALOR, E
ECONOMICA, 1983, 50 (200) : 477 - 483
[49] Insuring Risk-Averse Agents
Hines, Greg
Larson, Kate
ALGORITHMIC DECISION THEORY, PROCEEDINGS, 2009, 5783 : 294 - 305
[50] Spatiotemporal Risk-Averse Routing
Iqbal, Farabi
Kuipers, Fernando
2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), 2016,

← 1 2 3 4 5 →