Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

被引:296
|
作者
Audibert, Jean-Yves [1 ,2 ]
Munos, Remi [3 ]
Szepesvari, Csaba [4 ]
机构
[1] Univ Paris Est, Ecole Ponts ParisTech, CERTIS, F-77455 Marne La Vallee, France
[2] Willow ENS INRIA, F-75005 Paris, France
[3] INRIA Lille Nord Europe, SequeL Project, F-59650 Villeneuve Dascq, France
[4] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Exploration-exploitation tradeoff; Multi-armed bandits; Bernstein's inequality; High-probability bound; Risk analysis;
D O I
10.1016/j.tcs.2009.01.016
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Algorithms based on upper confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. This paper considers a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, Such algorithms were found to outperform the competing algorithms. We provide the first analysis of the expected regret for such algorithms. As expected, our results show that the algorithm that uses the variance estimates has a major advantage over its alternatives that do not use Such estimates provided that the variances of the payoffs of the suboptimal arms are low. We also prove that the regret concentrates only at a polynomial rate. This holds for all the upper confidence bound based algorithms and for all bandit problems except those special ones where with probability one the payoff obtained by pulling the optimal arm is larger than the expected payoff for the second best arm. Hence, although upper confidence bound bandit algorithms achieve logarithmic expected regret rates, they might not be Suitable for a risk-averse decision maker. We illustrate some of the results by Computer simulations. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:1876 / 1902
页数:27
相关论文
共 50 条
  • [1] Multi-Armed Bandits for Minesweeper: Profiting From Exploration-Exploitation Synergy
    Lordeiro, Igor Q.
    Haddad, Diego B.
    Cardoso, Douglas O.
    IEEE TRANSACTIONS ON GAMES, 2022, 14 (03) : 403 - 412
  • [2] Exploration, Exploitation, and Engagement in Multi-Armed Bandits with Abandonment
    Yang, Zixian
    Liu, Xin
    Ying, Lei
    2022 58TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2022,
  • [3] Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits
    Wu, Huasen
    Guo, Xueying
    Liu, Xin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [4] Exploration, Exploitation, and Engagement in Multi-Armed Bandits with Abandonment
    Yang, Zixian
    Liu, Xin
    Ying, Lei
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 55
  • [5] Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits
    Rouyer, Chloe
    Seldin, Yevgeny
    CONFERENCE ON LEARNING THEORY, VOL 125, 2020, 125
  • [6] Decentralized Exploration in Multi-Armed Bandits
    Feraud, Raphael
    Alami, Reda
    Laroche, Romain
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [7] On Interruptible Pure Exploration in Multi-Armed Bandits
    Shleyfman, Alexander
    Komenda, Antonin
    Domshlak, Carmel
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3592 - 3598
  • [8] Quantum Exploration Algorithms for Multi-Armed Bandits
    Wang, Daochen
    You, Xuchen
    Li, Tongyang
    Childs, Andrew M.
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10102 - 10110
  • [9] Combinatorial Pure Exploration of Multi-Armed Bandits
    Chen, Shouyuan
    Lin, Tian
    King, Irwin
    Lyu, Michael R.
    Chen, Wei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [10] Pure Exploration in Multi-armed Bandits Problems
    Bubeck, Sebastien
    Munos, Remi
    Stoltz, Gilles
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2009, 5809 : 23 - +