Stacked Thompson Bandits

被引:3
|
作者
Belzner, Lenz [1 ]
Gabor, Thomas [1 ]
机构
[1] Ludwig Maximilians Univ Munchen, Inst Informat, Munich, Germany
关键词
D O I
10.1109/SEsCPS.2017.4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We introduce Stacked Thompson Bandits (STB) for efficiently generating plans that are likely to satisfy a given bounded temporal logic requirement. STB uses a simulation for evaluation of plans, and takes a Bayesian approach to using the resulting information to guide its search. In particular, we show that stacking multiarmed bandits and using Thompson sampling to guide the action selection process for each bandit enables STB to generate plans that satisfy requirements with a high probability while only searching a fraction of the search space.
引用
收藏
页码:18 / 21
页数:4
相关论文
共 50 条
  • [21] Thompson Sampling for Robust Transfer in Multi-Task Bandits
    Wang, Zhi
    Zhang, Chicheng
    Chaudhuri, Kamalika
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [22] Optimal Thompson Sampling strategies for support-aware CVaR bandits
    Baudry, Dorian
    Gautron, Romain
    Kaufmann, Emilie
    Maillard, Odalric-Ambrym
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [23] Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits
    Chakraborty, Sunrit
    Roy, Saptarshi
    Tewari, Ambuj
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [24] Asymptotic Performance of Thompson Sampling for Batched Multi-Armed Bandits
    Kalkanli, Cem
    Ozgur, Ayfer
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (09) : 5956 - 5970
  • [25] Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits
    Kalkanli, Cem
    Ozgur, Ayfer
    2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 539 - 544
  • [26] Thompson sampling for multi-armed bandits in big data environments
    Kim, Min Kyong
    Hwang, Beom Seuk
    KOREAN JOURNAL OF APPLIED STATISTICS, 2024, 37 (05)
  • [27] Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling
    Trinh, Cindy
    Kaufmann, Emilie
    Vernade, Claire
    Combes, Richard
    ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 862 - 889
  • [28] A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits
    Chang, Joel Q. L.
    Tan, Vincent Y. F.
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6159 - 6166
  • [29] PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits
    Dumitrascu, Bianca
    Feng, Karen
    Engelhardt, Barbara E.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [30] Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits
    Kim, Wonyoung
    Lee, Kyungbok
    Paik, Myunghee Cho
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8300 - 8307