SEQUENTIAL MONTE CARLO BANDITS

被引:0
|
作者
Urteaga, Inigo [1 ,2 ]
Wiggins, Chris h. [3 ]
机构
[1] BCAM Basque Ctr Appl Math, Bilbao, Spain
[2] Basque Fdn Sci, Ikerbasque, Bilbao, Spain
[3] Columbia Univ, Dept Appl Phys & Appl Math, New York, NY USA
关键词
Sequential Monte Carlo; multi-armed bandits; restless bandits; linear dynamical systems; nonlinear reward functions; HIDDEN ARMA PROCESSES; PARAMETER-ESTIMATION; PARTICLE FILTERS; BAYESIAN-ESTIMATION; STATE; ALLOCATION;
D O I
10.3934/fods.2024005
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
. We extend state-of-the-art Bayesian multi-armed bandit (MAB) algorithms beyond their original setting by making use of sequential Monte Carlo (SMC) methods. A MAB is a sequential decision making problem where the goal is to learn a policy that maximizes long term payoff, where only the reward of the executed action is observed. In the stochastic MAB, the reward for each action is generated from an unknown distribution, often assumed to be stationary. To decide which action to take next, a MAB agent must learn the characteristics of the unknown reward distribution, e.g., compute its sufficient statistics. However, closed-form expressions for these statistics are analytically intractable except for simple, stationary cases. We here utilize SMC for estimation of the statistics Bayesian MAB agents compute, and devise flexible policies that can address a rich class of bandit problems: i.e., MABs with nonlinear, stateless- and context-dependent reward distributions that evolve over time. We showcase how non-stationary bandits, where time dynamics are modeled via linear dynamical systems, can be successfully addressed by SMC-based Bayesian bandit agents. We empirically demonstrate good regret performance of the proposed SMC-based bandit policies in several MAB scenarios that have remained elusive, i.e., in non-stationary bandits with nonlinear rewards.
引用
收藏
页数:57
相关论文
共 50 条
  • [41] BACKWARD SEQUENTIAL MONTE CARLO FOR MARGINAL SMOOTHING
    Kronander, Joel
    Schon, Thomas B.
    Dahlin, Johan
    2014 IEEE WORKSHOP ON STATISTICAL SIGNAL PROCESSING (SSP), 2014, : 368 - 371
  • [42] ONLINE SEQUENTIAL MONTE CARLO EM ALGORITHM
    Cappe, Olivier
    2009 IEEE/SP 15TH WORKSHOP ON STATISTICAL SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 37 - 40
  • [43] Sequential Monte Carlo for rare event estimation
    Cerou, F.
    Del Moral, P.
    Furon, T.
    Guyader, A.
    STATISTICS AND COMPUTING, 2012, 22 (03) : 795 - 808
  • [44] Sequential Monte Carlo learning with hyperparameter adjustments
    Wada, K
    Yosui, K
    Nakada, Y
    Matsumoto, T
    PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, : 274 - 279
  • [45] Waste-free sequential Monte Carlo
    Dau, Hai-Dang
    Chopin, Nicolas
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (01) : 114 - 148
  • [46] Independent Resampling Sequential Monte Carlo Algorithms
    Lamberti, Roland
    Petetin, Yohan
    Desbouvries, Francois
    Septier, Francois
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2017, 65 (20) : 5318 - 5333
  • [47] Accelerating sequential Monte Carlo with surrogate likelihoods
    Bon, Joshua J.
    Lee, Anthony
    Drovandi, Christopher
    STATISTICS AND COMPUTING, 2021, 31 (05)
  • [48] Properties of marginal sequential Monte Carlo methods
    Crucinio, Francesca R.
    Johansen, Adam M.
    STATISTICS & PROBABILITY LETTERS, 2023, 203
  • [49] Sequential Monte Carlo with Highly Informative Observations
    Del Moral, Pierre
    Murray, Lawrence M.
    SIAM-ASA JOURNAL ON UNCERTAINTY QUANTIFICATION, 2015, 3 (01): : 969 - 997
  • [50] SMCTC: Sequential Monte Carlo in C plus
    Johansen, Adam M.
    JOURNAL OF STATISTICAL SOFTWARE, 2009, 30 (06): : 1 - 41