Risk-Averse Allocation Indices for Multiarmed Bandit Problem

被引:4
|
作者
Malekipirbazari, Milad [1 ]
Cavus, Ozlem [1 ]
机构
[1] Bilkent Univ, Dept Ind Engn, TR-06800 Ankara, Turkey
关键词
Markov processes; Indexes; Resource management; Heuristic algorithms; Dynamic scheduling; Routing; Random variables; Coherent risk measures; dynamic allocation index; dynamic risk-aversion; Gittins index; multiarmed bandit (MAB);
D O I
10.1109/TAC.2021.3053539
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In classical multiarmed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision-maker is risk-neutral. On the other hand, the decision-makers are risk-averse in some real-life applications. In this article, we design a new setting based on the concept of dynamic risk measures where the aim is to find a policy with the best risk-adjusted total discounted outcome. We provide a theoretical analysis of multiarmed bandit problem with respect to this novel setting and propose a priority-index heuristic which gives risk-averse allocation indices having a structure similar to Gittins index. Although an optimal policy is shown not always to have index-based form, empirical results express the excellence of this heuristic and show that with risk-averse allocation indices we can achieve optimal or near-optimal interpretable policies.
引用
收藏
页码:5522 / 5529
页数:8
相关论文
共 50 条
  • [1] Risk-Averse Stochastic Convex Bandit
    Cardoso, Adrian Rivera
    Xu, Huan
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89 : 39 - 47
  • [2] ADAPTIVE TREATMENT ALLOCATION AND THE MULTIARMED BANDIT PROBLEM
    LAI, TL
    ANNALS OF STATISTICS, 1987, 15 (03): : 1091 - 1114
  • [3] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
    Lin, Yifan
    Wang, Yuhao
    Zhou, Enlu
    JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2022,
  • [4] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
    Lin, Yifan
    Wang, Yuhao
    Zhou, Enlu
    JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2023, 32 (03) : 267 - 288
  • [5] Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs
    Yifan Lin
    Yuhao Wang
    Enlu Zhou
    Journal of Systems Science and Systems Engineering, 2023, 32 : 267 - 288
  • [6] Risk-Averse Trees for Learning from Logged Bandit Feedback
    Trovo, Francesco
    Paladino, Stefano
    Simone, Paolo
    Restelli, Marcello
    Gatti, Nicola
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 976 - 983
  • [7] The risk-averse ultimate pit problem
    Canessa, Gianpiero
    Moreno, Eduardo
    Pagnoncelli, Bernardo K.
    OPTIMIZATION AND ENGINEERING, 2021, 22 (04) : 2655 - 2678
  • [8] The nonstochastic multiarmed bandit problem
    Auer, P
    Cesa-Bianchi, N
    Freund, Y
    Schapire, RE
    SIAM JOURNAL ON COMPUTING, 2003, 32 (01) : 48 - 77
  • [9] The Irrevocable Multiarmed Bandit Problem
    Farias, Vivek F.
    Madan, Ritesh
    OPERATIONS RESEARCH, 2011, 59 (02) : 383 - 399
  • [10] The risk-averse ultimate pit problem
    Gianpiero Canessa
    Eduardo Moreno
    Bernardo K. Pagnoncelli
    Optimization and Engineering, 2021, 22 : 2655 - 2678