Mean-Variance and Value at Risk in Multi-Armed Bandit Problems

被引：0

作者：

Vakili, Sattar ^{[1
]}

Zhao, Qing ^{[1
]}

机构：

[1] Cornell Univ, Sch Elect & Comp Engn, Ithaca, NY 14850 USA

来源：

2015 53RD ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON) | 2015年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study risk-averse multi-armed bandit problems under different risk measures. We consider three risk mitigation models. In the first model, the variations in the reward values obtained at different times are considered as risk and the objective is to minimize the mean-variance of the observed rewards. In the second and the third models, the quantity of interest is the total reward at the end of the time horizon, and the objective is to minimize the mean-variance and maximize the value at risk of the total reward, respectively. We develop risk-averse online learning policies and analyze their regret performance. We also provide tight lower bounds on regret under the model of mean-variance of observations.

引用

页码：1330 / 1335

页数：6

共 50 条

[21] GAUSSIAN PROCESS MODELLING OF DEPENDENCIES IN MULTI-ARMED BANDIT PROBLEMS
Dorard, Louis
Glowacka, Dorota
Shawe-Taylor, John
PROCEEDINGS OF THE 10TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH SOR 09, 2009, : 77 - 84
[22] Time-Varying Stochastic Multi-Armed Bandit Problems
Vakili, Sattar
Zhao, Qing
Zhou, Yuan
CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 2103 - 2107
[23] An asymptotically optimal strategy for constrained multi-armed bandit problems
Chang, Hyeong Soo
MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2020, 91 (03) : 545 - 557
[24] Synchronization and optimality for multi-armed bandit problems in continuous time
ElKaroui, N
Karatzas, I
COMPUTATIONAL & APPLIED MATHEMATICS, 1997, 16 (02): : 117 - 151
[25] Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems
Vakili, Sattar
Liu, Keqin
Zhao, Qing
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2013, 7 (05) : 759 - 767
[26] The Effect of Communication on Noncooperative Multiplayer Multi-Armed Bandit Problems
Evirgen, Noyan
Kose, Alper
2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 331 - 336
[27] An asymptotically optimal strategy for constrained multi-armed bandit problems
Hyeong Soo Chang
Mathematical Methods of Operations Research, 2020, 91 : 545 - 557
[28] On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems
Kim, Baekjin
Tewari, Ambuj
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[29] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
Bubeck, Sebastien
Cesa-Bianchi, Nicolo
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2012, 5 (01): : 1 - 122
[30] Mean Field Equilibrium in Multi-Armed Bandit Game with Continuous Reward
Wang, Xiong
Jia, Riheng
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3118 - 3124

← 1 2 3 4 5 →