Mechanisms with learning for stochastic multi-armed bandit problems

被引:0
|
作者
Shweta Jain
Satyanath Bhat
Ganesh Ghalme
Divya Padmanabhan
Y. Narahari
机构
[1] Indian Institute of Science,Department of Computer Science and Automation
关键词
Multi-armed Bandit; mechanism design; learning algorithms;
D O I
暂无
中图分类号
学科分类号
摘要
The multi-armed bandit (MAB) problem is a widely studied problem in machine learning literature in the context of online learning. In this article, our focus is on a specific class of problems namely stochastic MAB problems where the rewards are stochastic. In particular, we emphasize stochastic MAB problems with strategic agents. Dealing with strategic agents warrants the use of mechanism design principles in conjunction with online learning, and leads to non-trivial technical challenges. In this paper, we first provide three motivating problems arising from Internet advertising, crowdsourcing, and smart grids. Next, we provide an overview of stochastic MAB problems and key associated learning algorithms including upper confidence bound (UCB) based algorithms. We provide proofs of important results related to regret analysis of the above learning algorithms. Following this, we present mechanism design for stochastic MAB problems. With the classic example of sponsored search auctions as a backdrop, we bring out key insights in important issues such as regret lower bounds, exploration separated mechanisms, designing truthful mechanisms, UCB based mechanisms, and extension to multiple pull MAB problems. Finally we provide a bird’s eye view of recent results in the area and present a few issues that require immediate future attention.
引用
收藏
页码:229 / 272
页数:43
相关论文
共 50 条
  • [1] MECHANISMS WITH LEARNING FOR STOCHASTIC MULTI-ARMED BANDIT PROBLEMS
    Jain, Shweta
    Bhat, Satyanath
    Ghalme, Ganesh
    Padmanabhan, Divya
    Narahari, Y.
    INDIAN JOURNAL OF PURE & APPLIED MATHEMATICS, 2016, 47 (02): : 229 - 272
  • [2] Thompson Sampling Based Mechanisms for Stochastic Multi-Armed Bandit Problems
    Ghalme, Ganesh
    Jain, Shweta
    Gujar, Sujit
    Narahari, Y.
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 87 - 95
  • [3] Achieving Complete Learning in Multi-Armed Bandit Problems
    Vakili, Sattar
    Zhao, Qing
    2013 ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2013, : 1778 - 1782
  • [4] The Multi-Armed Bandit With Stochastic Plays
    Lesage-Landry, Antoine
    Taylor, Joshua A.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (07) : 2280 - 2286
  • [5] Time-Varying Stochastic Multi-Armed Bandit Problems
    Vakili, Sattar
    Zhao, Qing
    Zhou, Yuan
    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 2103 - 2107
  • [6] On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems
    Kim, Baekjin
    Tewari, Ambuj
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [7] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
    Bubeck, Sebastien
    Cesa-Bianchi, Nicolo
    FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2012, 5 (01): : 1 - 122
  • [8] Satisficing in Multi-Armed Bandit Problems
    Reverdy, Paul
    Srivastava, Vaibhav
    Leonard, Naomi Ehrich
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (08) : 3788 - 3803
  • [9] Characterizing Truthful Multi-Armed Bandit Mechanisms
    Babaioff, Moshe
    Sharma, Yogeshwer
    Slivkins, Aleksandrs
    10TH ACM CONFERENCE ON ELECTRONIC COMMERCE - EC 2009, 2009, : 79 - 88
  • [10] CHARACTERIZING TRUTHFUL MULTI-ARMED BANDIT MECHANISMS
    Babaioff, Moshe
    Sharma, Yogeshwer
    Slivkins, Aleksandrs
    SIAM JOURNAL ON COMPUTING, 2014, 43 (01) : 194 - 230