Stacked Thompson Bandits

被引:3
|
作者
Belzner, Lenz [1 ]
Gabor, Thomas [1 ]
机构
[1] Ludwig Maximilians Univ Munchen, Inst Informat, Munich, Germany
关键词
D O I
10.1109/SEsCPS.2017.4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We introduce Stacked Thompson Bandits (STB) for efficiently generating plans that are likely to satisfy a given bounded temporal logic requirement. STB uses a simulation for evaluation of plans, and takes a Bayesian approach to using the resulting information to guide its search. In particular, we show that stacking multiarmed bandits and using Thompson sampling to guide the action selection process for each bandit enables STB to generate plans that satisfy requirements with a high probability while only searching a fraction of the search space.
引用
收藏
页码:18 / 21
页数:4
相关论文
共 50 条
  • [31] DOUBLE-LINEAR THOMPSON SAMPLING FOR CONTEXT-ATTENTIVE BANDITS
    Bouneffouf, Djallel
    Feraud, Raphael
    Upadhyay, Sohini
    Khazaeni, Yasaman
    Rish, Irina
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3450 - 3454
  • [32] Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning
    Zhang, Tong
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2022, 4 (02): : 834 - 857
  • [33] The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle
    Kong, Fang
    Yang, Yueran
    Chen, Wei
    Li, Shuai
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [34] Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits
    Park, Hongju
    Faradonbeh, Mohamad Kazem Shirani
    IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2150 - 2155
  • [35] A Change-Detection-Based Thompson Sampling Framework for Non-Stationary Bandits
    Ghatak, Gourab
    IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (10) : 1670 - 1676
  • [36] eLifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits
    Neu, Gergely
    Olkhovskaya, Julia
    Papini, Matteo
    Schwartz, Ludovic
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [37] Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis
    Jose, Sharu Theresa
    Moothedath, Shana
    ENTROPY, 2024, 26 (07)
  • [38] Near-Optimal Thompson Sampling-based Algorithms for Differentially Private Stochastic Bandits
    Hu, Bingshan
    Hegde, Nidhi
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 844 - +
  • [39] Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits
    Jin, Tianyuan
    Xu, Pan
    Xiao, Xiaokui
    Anandkumar, Anima
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [40] 'BANDITS, BANDITS' - GILLIAM,T
    ZIMMER, J
    REVUE DU CINEMA, 1982, (371): : 54 - 54