Stacked Thompson Bandits

被引：3

作者：

Belzner, Lenz ^{[1
]}

Gabor, Thomas ^{[1
]}

机构：

[1] Ludwig Maximilians Univ Munchen, Inst Informat, Munich, Germany

来源：

2017 IEEE/ACM 3RD INTERNATIONAL WORKSHOP ON SOFTWARE ENGINEERING FOR SMART CYBER-PHYSICAL SYSTEMS (SESCPS 2017) | 2017年

关键词：

D O I：

10.1109/SEsCPS.2017.4

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

We introduce Stacked Thompson Bandits (STB) for efficiently generating plans that are likely to satisfy a given bounded temporal logic requirement. STB uses a simulation for evaluation of plans, and takes a Bayesian approach to using the resulting information to guide its search. In particular, we show that stacking multiarmed bandits and using Thompson sampling to guide the action selection process for each bandit enables STB to generate plans that satisfy requirements with a high probability while only searching a fraction of the search space.

引用

页码：18 / 21

页数：4

共 50 条

[31] DOUBLE-LINEAR THOMPSON SAMPLING FOR CONTEXT-ATTENTIVE BANDITS
Bouneffouf, Djallel
Feraud, Raphael
Upadhyay, Sohini
Khazaeni, Yasaman
Rish, Irina
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3450 - 3454
[32] Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning
Zhang, Tong
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2022, 4 (02): : 834 - 857
[33] The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle
Kong, Fang
Yang, Yueran
Chen, Wei
Li, Shuai
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[34] Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits
Park, Hongju
Faradonbeh, Mohamad Kazem Shirani
IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2150 - 2155
[35] A Change-Detection-Based Thompson Sampling Framework for Non-Stationary Bandits
Ghatak, Gourab
IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (10) : 1670 - 1676
[36] eLifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits
Neu, Gergely
Olkhovskaya, Julia
Papini, Matteo
Schwartz, Ludovic
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[37] Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis
Jose, Sharu Theresa
Moothedath, Shana
ENTROPY, 2024, 26 (07)
[38] Near-Optimal Thompson Sampling-based Algorithms for Differentially Private Stochastic Bandits
Hu, Bingshan
Hegde, Nidhi
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 844 - +
[39] Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits
Jin, Tianyuan
Xu, Pan
Xiao, Xiaokui
Anandkumar, Anima
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[40] 'BANDITS, BANDITS' - GILLIAM,T
ZIMMER, J
REVUE DU CINEMA, 1982, (371): : 54 - 54

← 1 2 3 4 5 →