Mechanisms with learning for stochastic multi-armed bandit problems

被引：0

作者：

Shweta Jain

Satyanath Bhat

Ganesh Ghalme

Divya Padmanabhan

Y. Narahari

机构：

[1] Indian Institute of Science,Department of Computer Science and Automation

来源：

Indian Journal of Pure and Applied Mathematics | 2016年 / 47卷

关键词：

Multi-armed Bandit; mechanism design; learning algorithms;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The multi-armed bandit (MAB) problem is a widely studied problem in machine learning literature in the context of online learning. In this article, our focus is on a specific class of problems namely stochastic MAB problems where the rewards are stochastic. In particular, we emphasize stochastic MAB problems with strategic agents. Dealing with strategic agents warrants the use of mechanism design principles in conjunction with online learning, and leads to non-trivial technical challenges. In this paper, we first provide three motivating problems arising from Internet advertising, crowdsourcing, and smart grids. Next, we provide an overview of stochastic MAB problems and key associated learning algorithms including upper confidence bound (UCB) based algorithms. We provide proofs of important results related to regret analysis of the above learning algorithms. Following this, we present mechanism design for stochastic MAB problems. With the classic example of sponsored search auctions as a backdrop, we bring out key insights in important issues such as regret lower bounds, exploration separated mechanisms, designing truthful mechanisms, UCB based mechanisms, and extension to multiple pull MAB problems. Finally we provide a bird’s eye view of recent results in the area and present a few issues that require immediate future attention.

引用

页码：229 / 272

页数：43

共 50 条

[21] Finite budget analysis of multi-armed bandit problems
Xia, Yingce
Qin, Tao
Ding, Wenkui
Li, Haifang
Zhang, Xudong
Yu, Nenghai
Liu, Tie-Yan
NEUROCOMPUTING, 2017, 258 : 13 - 29
[22] Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems
Even-Dar, Eyal
Mannor, Shie
Mansour, Yishay
JOURNAL OF MACHINE LEARNING RESEARCH, 2006, 7 : 1079 - 1105
[23] Non-stationary stochastic multi-armed bandit problems with external information on stationarity
Namba H.
Transactions of the Japanese Society for Artificial Intelligence, 2021, 36 (03) : D - K84_1
[24] The multi-armed bandit, with constraints
Eric V. Denardo
Eugene A. Feinberg
Uriel G. Rothblum
Annals of Operations Research, 2013, 208 : 37 - 62
[25] The multi-armed bandit, with constraints
Denardo, Eric V.
Feinberg, Eugene A.
Rothblum, Uriel G.
ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) : 37 - 62
[26] The Assistive Multi-Armed Bandit
Chan, Lawrence
Hadfield-Menell, Dylan
Srinivasa, Siddhartha
Dragan, Anca
HRI '19: 2019 14TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2019, : 354 - 363
[27] Multi-armed bandit games
Gursoy, Kemal
ANNALS OF OPERATIONS RESEARCH, 2024,
[28] The non-stationary stochastic multi-armed bandit problem
Allesiardo R.
Féraud R.
Maillard O.-A.
Allesiardo, Robin (robin.allesiardo@gmail.com), 1600, Springer Science and Business Media Deutschland GmbH (03): : 267 - 283
[29] Adaptive Active Learning as a Multi-armed Bandit Problem
Czarnecki, Wojciech M.
Podolak, Igor T.
21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 989 - 990
[30] Multi-armed Bandit Algorithms for Adaptive Learning: A Survey
Mui, John
Lin, Fuhua
Dewan, M. Ali Akber
ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II, 2021, 12749 : 273 - 278

← 1 2 3 4 5 →