Piecewise Stationary Bandits under Risk Criteria

被引：0

作者：

Bhatt, Sujay ^{[1
]}

Fang, Guanhua ^{[2
]}

Li, Ping ^{[3
]}

机构：

[1] JP Morgan AI Res, New York, NY 10017 USA

[2] Fudan Univ, Sch Management, Shanghai, Peoples R China

[3] LinkedIn Ads, Bellevue, WA 98004 USA

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206 | 2023年 / 206卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Piecewise stationary stochastic multi-armed bandits have been extensively explored in the risk-neutral and sub-Gaussian setting. In this work, we consider a multi-armed bandit framework in which the reward distributions are heavy-tailed and non-stationary, and evaluate the performance of algorithms using general risk criteria. Specifically, we make the following contributions: (i) We first propose a non-parametric change detection algorithm that can detect general distributional changes in heavy-tailed distributions. (ii) We then propose a truncation-based UCB-type bandit algorithm integrating the above regime change detection algorithm to minimize the regret of the non-stationary learning problem. (iii) Finally, we establish the regret bounds for the proposed bandit algorithm by characterizing the statistical properties of the general change detection algorithm, along with a novel regret analysis.

引用

页数：23

共 50 条

[31] Stochastic Bandits with Graph Feedback in Non-Stationary Environments
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing
210023, China
不详
100102, China
AAAI Conf. Artif. Intell., AAAI, 1600, (8758-8766): : 8758 - 8766
[32] Time-Decaying Bandits for Non-stationary Systems
Komiyama, Junpei, 1600, Springer Verlag (8877):
[33] Beam Alignment for mmWave Using Non-Stationary Bandits
Gupta, Ruchir
Lakshmanan, K.
Sah, Abhay Kumar
IEEE COMMUNICATIONS LETTERS, 2020, 24 (11) : 2619 - 2622
[34] Stationary Bandits Lessons from the Practice of Research from Sicily
Sabetti, Filippo
SOCIOLOGICA-ITALIAN JOURNAL OF SOCIOLOGY ON LINE, 2011, (02):
[35] Stochastic Bandits with Graph Feedback in Non-Stationary Environments
Lu, Shiyin
Hu, Yao
Zhang, Lijun
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8758 - 8766
[36] Time-Decaying Bandits for Non-stationary Systems
Komiyama, Junpei
Qin, Tao
WEB AND INTERNET ECONOMICS, 2014, 8877 : 460 - 466
[37] Randomized Exploration for Non-Stationary Stochastic Linear Bandits
Kim, Baekjin
Tewari, Ambuj
CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 71 - 80
[38] Non-stationary Dueling Bandits for Online Learning to Rank
Lu, Shiyin
Miao, Yuan
Yang, Ping
Hu, Yao
Zhang, Lijun
WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 166 - 174
[39] Reward Attack on Stochastic Bandits with Non-stationary Rewards
Yang, Chenye
Liu, Guanlin
Lai, Lifeng
FIFTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEECONF, 2023, : 1387 - 1393
[40] Cornering Stationary and Restless Mixing Bandits with Remix-UCB
Audiffren, Julien
Ralaivola, Liva
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28

← 1 2 3 4 5 →