SEQUENTIAL MONTE CARLO BANDITS

被引:0
|
作者
Urteaga, Inigo [1 ,2 ]
Wiggins, Chris h. [3 ]
机构
[1] BCAM Basque Ctr Appl Math, Bilbao, Spain
[2] Basque Fdn Sci, Ikerbasque, Bilbao, Spain
[3] Columbia Univ, Dept Appl Phys & Appl Math, New York, NY USA
关键词
Sequential Monte Carlo; multi-armed bandits; restless bandits; linear dynamical systems; nonlinear reward functions; HIDDEN ARMA PROCESSES; PARAMETER-ESTIMATION; PARTICLE FILTERS; BAYESIAN-ESTIMATION; STATE; ALLOCATION;
D O I
10.3934/fods.2024005
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
. We extend state-of-the-art Bayesian multi-armed bandit (MAB) algorithms beyond their original setting by making use of sequential Monte Carlo (SMC) methods. A MAB is a sequential decision making problem where the goal is to learn a policy that maximizes long term payoff, where only the reward of the executed action is observed. In the stochastic MAB, the reward for each action is generated from an unknown distribution, often assumed to be stationary. To decide which action to take next, a MAB agent must learn the characteristics of the unknown reward distribution, e.g., compute its sufficient statistics. However, closed-form expressions for these statistics are analytically intractable except for simple, stationary cases. We here utilize SMC for estimation of the statistics Bayesian MAB agents compute, and devise flexible policies that can address a rich class of bandit problems: i.e., MABs with nonlinear, stateless- and context-dependent reward distributions that evolve over time. We showcase how non-stationary bandits, where time dynamics are modeled via linear dynamical systems, can be successfully addressed by SMC-based Bayesian bandit agents. We empirically demonstrate good regret performance of the proposed SMC-based bandit policies in several MAB scenarios that have remained elusive, i.e., in non-stationary bandits with nonlinear rewards.
引用
收藏
页数:57
相关论文
共 50 条
  • [21] Sequential Monte Carlo simulated annealing
    Enlu Zhou
    Xi Chen
    Journal of Global Optimization, 2013, 55 : 101 - 124
  • [22] Replica Conditional Sequential Monte Carlo
    Shestopaloff, Alexander Y.
    Doucet, Arnaud
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [23] Sequential Monte Carlo without likelihoods
    Sisson, S. A.
    Fan, Y.
    Tanaka, Mark M.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (06) : 1760 - 1765
  • [24] Sequential Monte Carlo with model tempering
    Mlikota, Marko
    Schorfheide, Frank
    STUDIES IN NONLINEAR DYNAMICS AND ECONOMETRICS, 2024, 28 (02): : 249 - 269
  • [25] Sequential Monte Carlo testing by betting
    Fischer, Lasse
    Ramdas, Aaditya
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2025,
  • [26] Sequential Monte Carlo: A Unified Review
    Wills, Adrian G.
    Schon, Thomas B.
    ANNUAL REVIEW OF CONTROL ROBOTICS AND AUTONOMOUS SYSTEMS, 2023, 6 : 159 - 182
  • [27] Power of the sequential Monte Carlo test
    Silva, I.
    Assunção, R.
    Costa, M.
    Sequential Analysis, 2009, 28 (02) : 163 - 174
  • [28] Asynchronous Anytime Sequential Monte Carlo
    Paige, Brooks
    Wood, Frank
    Doucet, Arnaud
    Teh, Yee Whye
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [29] Sequential Monte Carlo for Graphical Models
    Naesseth, Christian A.
    Lindsten, Fredrik
    Schott, Thomas B.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [30] Sequential Monte Carlo Instant Radiosity
    Hedman, Peter
    Karras, Tero
    Lehtinen, Jaakko
    PROCEEDINGS I3D 2016: 20TH ACM SIGGRAPH SYMPOSIUM ON INTERACTIVE 3D GRAPHICS AND GAMES, 2016, : 121 - 128