Globally Informative Thompson Sampling for Structured Bandit Problems with Application to CrowdTranscoding

被引：0

作者：

Liu, Xingchi ^{[1
]}

Derakhshani, Mahsa ^{[1
]}

Zhu, Ziming ^{[2
]}

Lambotharan, Sangarapillai ^{[1
]}

机构：

[1] Loughborough Univ, Wolfson Sch Mech Elect & Mfg Engn, Signal Proc & Networks Res Grp, Loughborough LE11 3TU, Leics, England

[2] Toshiba Europe Ltd, Bristol Res & Innovat Lab, Bristol BS1 4ND, Avon, England

来源：

3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021) | 2021年

基金：

英国工程与自然科学研究理事会;

关键词：

Multi-armed bandit; Thompson sampling; Structured bandit; Edge computing; MULTIARMED BANDIT;

D O I：

10.1109/ICAIIC51459.2021.9415255

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-armed bandit is a widely-studied model for sequential decision-making problems. The most studied model in the literature is stochastic bandits wherein the reward of each arm follows an independent distribution. However, there is a wide range of applications where the rewards of different alternatives are correlated to some extent. In this paper, a class of structured bandit problems is studied in which rewards of different arms are functions of the same unknown parameter vector. To minimize the cumulative learning regret, we propose a globally-informative Thompson sampling algorithm to learn and leverage the correlation among arms, which can deal with unknown multidimensional parameter and non-monotonic reward functions. Our studies demonstrate that the proposed algorithm achieves significant improvement in the learning speed. In particular, the designed algorithm is used to solve an edge transcoder selection problem in crowdsourced live video streaming systems and shows superior performance as compared to the existing schemes.

引用

页码：210 / 215

页数：6

共 50 条

[1] Thompson Sampling for Non-Stationary Bandit Problems
Qi, Han
Guo, Fei
Zhu, Li
ENTROPY, 2025, 27 (01)
[2] Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems
Jung, Young Hun
Tewari, Ambuj
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[3] Thompson Sampling for the Multinomial Logit Bandit
Agrawal, Shipra
Avadhanula, Vashist
Goyal, Vineet
Zeevi, Assaf
MATHEMATICS OF OPERATIONS RESEARCH, 2025,
[4] Thompson Sampling Based Mechanisms for Stochastic Multi-Armed Bandit Problems
Ghalme, Ganesh
Jain, Shweta
Gujar, Sujit
Narahari, Y.
AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 87 - 95
[5] Contextual Bandit for Active Learning: Active Thompson Sampling
Bouneffouf, Djallel
Laroche, Romain
Urvoy, Tanguy
Feraud, Raphael
Allesiardo, Robin
NEURAL INFORMATION PROCESSING (ICONIP 2014), PT I, 2014, 8834 : 405 - 412
[6] Bandit Algorithms Based on Thompson Sampling for Bounded Reward Distributions
Riou, Charles
Honda, Junya
ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 777 - 826
[7] The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models
Lee, Jongyeong
Chiang, Chao-Kai
Sugiyama, Masashi
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13383 - 13390
[8] A Thompson Sampling Approach to Unifying Causal Inference and Bandit Learning
Xu, Hanxuan
Xie, Hong
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT II, 2023, 13936 : 255 - 266
[9] A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit
Yang, Long
Li, Zhao
Hu, Zehong
Ruan, Shasha
Pan, Gang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 5332 - 5341
[10] An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting
Kalkanli, Cem
Ozgur, Ayfer
2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 2783 - 2788

← 1 2 3 4 5 →