SEQUENTIAL MONTE CARLO BANDITS

被引:0
|
作者
Urteaga, Inigo [1 ,2 ]
Wiggins, Chris h. [3 ]
机构
[1] BCAM Basque Ctr Appl Math, Bilbao, Spain
[2] Basque Fdn Sci, Ikerbasque, Bilbao, Spain
[3] Columbia Univ, Dept Appl Phys & Appl Math, New York, NY USA
关键词
Sequential Monte Carlo; multi-armed bandits; restless bandits; linear dynamical systems; nonlinear reward functions; HIDDEN ARMA PROCESSES; PARAMETER-ESTIMATION; PARTICLE FILTERS; BAYESIAN-ESTIMATION; STATE; ALLOCATION;
D O I
10.3934/fods.2024005
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
. We extend state-of-the-art Bayesian multi-armed bandit (MAB) algorithms beyond their original setting by making use of sequential Monte Carlo (SMC) methods. A MAB is a sequential decision making problem where the goal is to learn a policy that maximizes long term payoff, where only the reward of the executed action is observed. In the stochastic MAB, the reward for each action is generated from an unknown distribution, often assumed to be stationary. To decide which action to take next, a MAB agent must learn the characteristics of the unknown reward distribution, e.g., compute its sufficient statistics. However, closed-form expressions for these statistics are analytically intractable except for simple, stationary cases. We here utilize SMC for estimation of the statistics Bayesian MAB agents compute, and devise flexible policies that can address a rich class of bandit problems: i.e., MABs with nonlinear, stateless- and context-dependent reward distributions that evolve over time. We showcase how non-stationary bandits, where time dynamics are modeled via linear dynamical systems, can be successfully addressed by SMC-based Bayesian bandit agents. We empirically demonstrate good regret performance of the proposed SMC-based bandit policies in several MAB scenarios that have remained elusive, i.e., in non-stationary bandits with nonlinear rewards.
引用
收藏
页数:57
相关论文
共 50 条
  • [31] Bootstrapping sequential Monte Carlo tracking
    Moeslund, TB
    Granum, E
    IMAGE ANALYSIS, PROCEEDINGS, 2003, 2749 : 1030 - 1037
  • [32] An Adaptive Sequential Monte Carlo Sampler
    Fearnhead, Paul
    Taylor, Benjamin M.
    BAYESIAN ANALYSIS, 2013, 8 (02): : 411 - 438
  • [33] Neural Adaptive Sequential Monte Carlo
    Gu, Shixiang
    Ghahramani, Zoubin
    Turner, Richard E.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [34] Multilevel sequential Monte Carlo samplers
    Beskos, Alexandros
    Jasra, Ajay
    Law, Kody
    Tempone, Raul
    Zhou, Yan
    STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 2017, 127 (05) : 1417 - 1440
  • [35] An Invitation to Sequential Monte Carlo Samplers
    Dai, Chenguang
    Heng, Jeremy
    Jacob, Pierre E.
    Whiteley, Nick
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (539) : 1587 - 1600
  • [36] Deterministic Sequential Monte Carlo for Haplotype Inference
    Ahn, Soyeon
    Vikalo, Hans
    2013 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2013, : 64 - 64
  • [37] Sequential Monte Carlo methods for navigation systems
    Sotak, Milos
    PRZEGLAD ELEKTROTECHNICZNY, 2011, 87 (06): : 249 - 252
  • [38] Sequential Monte Carlo optimization and statistical inference
    Duan, Jin-Chuan
    Li, Shuping
    Xu, Yaxian
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2023, 15 (03)
  • [39] Sequential Monte Carlo methods for diffusion processes
    Jasra, Ajay
    Doucet, Arnaud
    PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2009, 465 (2112): : 3709 - 3727
  • [40] Sequential Monte Carlo for Model Predictive Control
    Kantas, N.
    Maciejowski, J. M.
    Lecchini-Visintini, A.
    NONLINEAR MODEL PREDICTIVE CONTROL: TOWARDS NEW CHALLENGING APPLICATIONS, 2009, 384 : 263 - +