Risk-Averse Allocation Indices for Multiarmed Bandit Problem

被引:4
|
作者
Malekipirbazari, Milad [1 ]
Cavus, Ozlem [1 ]
机构
[1] Bilkent Univ, Dept Ind Engn, TR-06800 Ankara, Turkey
关键词
Markov processes; Indexes; Resource management; Heuristic algorithms; Dynamic scheduling; Routing; Random variables; Coherent risk measures; dynamic allocation index; dynamic risk-aversion; Gittins index; multiarmed bandit (MAB);
D O I
10.1109/TAC.2021.3053539
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In classical multiarmed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision-maker is risk-neutral. On the other hand, the decision-makers are risk-averse in some real-life applications. In this article, we design a new setting based on the concept of dynamic risk measures where the aim is to find a policy with the best risk-adjusted total discounted outcome. We provide a theoretical analysis of multiarmed bandit problem with respect to this novel setting and propose a priority-index heuristic which gives risk-averse allocation indices having a structure similar to Gittins index. Although an optimal policy is shown not always to have index-based form, empirical results express the excellence of this heuristic and show that with risk-averse allocation indices we can achieve optimal or near-optimal interpretable policies.
引用
收藏
页码:5522 / 5529
页数:8
相关论文
共 50 条
  • [41] A risk-averse competitive newsvendor problem under the CVaR criterion
    Wu, Meng
    Zhu, Stuart X.
    Teunter, Ruud H.
    INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 2014, 156 : 13 - 23
  • [42] Risk-Averse Biased Human Policies with a Robot Assistant in Multi-Armed Bandit Settings
    Koller, Michael
    Patten, Timothy
    Vincze, Markus
    THE 14TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2021, 2021, : 483 - 488
  • [43] I am not risk-averse
    Leach, RE
    AMERICAN JOURNAL OF SPORTS MEDICINE, 2000, 28 (06): : 777 - 777
  • [44] Risk-averse firms in oligopoly
    Asplund, M
    INTERNATIONAL JOURNAL OF INDUSTRIAL ORGANIZATION, 2002, 20 (07) : 995 - 1012
  • [45] Risk-Averse Multi-Armed Bandit Problems Under Mean-Variance Measure
    Vakili, Sattar
    Zhao, Qing
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2016, 10 (06) : 1093 - 1111
  • [46] Time to get risk-averse
    Webzell, Steed
    Operations Engineer, 2023, 2023 (01): : 14 - 15
  • [47] Finite-time Analysis of the Multiarmed Bandit Problem
    Peter Auer
    Nicolò Cesa-Bianchi
    Paul Fischer
    Machine Learning, 2002, 47 : 235 - 256
  • [48] SALES AND RISK-AVERSE CONSUMERS
    GALOR, E
    ECONOMICA, 1983, 50 (200) : 477 - 483
  • [49] Insuring Risk-Averse Agents
    Hines, Greg
    Larson, Kate
    ALGORITHMIC DECISION THEORY, PROCEEDINGS, 2009, 5783 : 294 - 305
  • [50] Spatiotemporal Risk-Averse Routing
    Iqbal, Farabi
    Kuipers, Fernando
    2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), 2016,