Piecewise Stationary Bandits under Risk Criteria

被引:0
|
作者
Bhatt, Sujay [1 ]
Fang, Guanhua [2 ]
Li, Ping [3 ]
机构
[1] JP Morgan AI Res, New York, NY 10017 USA
[2] Fudan Univ, Sch Management, Shanghai, Peoples R China
[3] LinkedIn Ads, Bellevue, WA 98004 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Piecewise stationary stochastic multi-armed bandits have been extensively explored in the risk-neutral and sub-Gaussian setting. In this work, we consider a multi-armed bandit framework in which the reward distributions are heavy-tailed and non-stationary, and evaluate the performance of algorithms using general risk criteria. Specifically, we make the following contributions: (i) We first propose a non-parametric change detection algorithm that can detect general distributional changes in heavy-tailed distributions. (ii) We then propose a truncation-based UCB-type bandit algorithm integrating the above regime change detection algorithm to minimize the regret of the non-stationary learning problem. (iii) Finally, we establish the regret bounds for the proposed bandit algorithm by characterizing the statistical properties of the general change detection algorithm, along with a novel regret analysis.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] Stochastic Bandits with Graph Feedback in Non-Stationary Environments
    National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing
    210023, China
    不详
    100102, China
    AAAI Conf. Artif. Intell., AAAI, 1600, (8758-8766): : 8758 - 8766
  • [32] Time-Decaying Bandits for Non-stationary Systems
    Komiyama, Junpei, 1600, Springer Verlag (8877):
  • [33] Beam Alignment for mmWave Using Non-Stationary Bandits
    Gupta, Ruchir
    Lakshmanan, K.
    Sah, Abhay Kumar
    IEEE COMMUNICATIONS LETTERS, 2020, 24 (11) : 2619 - 2622
  • [34] Stationary Bandits Lessons from the Practice of Research from Sicily
    Sabetti, Filippo
    SOCIOLOGICA-ITALIAN JOURNAL OF SOCIOLOGY ON LINE, 2011, (02):
  • [35] Stochastic Bandits with Graph Feedback in Non-Stationary Environments
    Lu, Shiyin
    Hu, Yao
    Zhang, Lijun
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8758 - 8766
  • [36] Time-Decaying Bandits for Non-stationary Systems
    Komiyama, Junpei
    Qin, Tao
    WEB AND INTERNET ECONOMICS, 2014, 8877 : 460 - 466
  • [37] Randomized Exploration for Non-Stationary Stochastic Linear Bandits
    Kim, Baekjin
    Tewari, Ambuj
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 71 - 80
  • [38] Non-stationary Dueling Bandits for Online Learning to Rank
    Lu, Shiyin
    Miao, Yuan
    Yang, Ping
    Hu, Yao
    Zhang, Lijun
    WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 166 - 174
  • [39] Reward Attack on Stochastic Bandits with Non-stationary Rewards
    Yang, Chenye
    Liu, Guanlin
    Lai, Lifeng
    FIFTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEECONF, 2023, : 1387 - 1393
  • [40] Cornering Stationary and Restless Mixing Bandits with Remix-UCB
    Audiffren, Julien
    Ralaivola, Liva
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28