Globally Informative Thompson Sampling for Structured Bandit Problems with Application to CrowdTranscoding

被引:0
|
作者
Liu, Xingchi [1 ]
Derakhshani, Mahsa [1 ]
Zhu, Ziming [2 ]
Lambotharan, Sangarapillai [1 ]
机构
[1] Loughborough Univ, Wolfson Sch Mech Elect & Mfg Engn, Signal Proc & Networks Res Grp, Loughborough LE11 3TU, Leics, England
[2] Toshiba Europe Ltd, Bristol Res & Innovat Lab, Bristol BS1 4ND, Avon, England
来源
3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021) | 2021年
基金
英国工程与自然科学研究理事会;
关键词
Multi-armed bandit; Thompson sampling; Structured bandit; Edge computing; MULTIARMED BANDIT;
D O I
10.1109/ICAIIC51459.2021.9415255
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-armed bandit is a widely-studied model for sequential decision-making problems. The most studied model in the literature is stochastic bandits wherein the reward of each arm follows an independent distribution. However, there is a wide range of applications where the rewards of different alternatives are correlated to some extent. In this paper, a class of structured bandit problems is studied in which rewards of different arms are functions of the same unknown parameter vector. To minimize the cumulative learning regret, we propose a globally-informative Thompson sampling algorithm to learn and leverage the correlation among arms, which can deal with unknown multidimensional parameter and non-monotonic reward functions. Our studies demonstrate that the proposed algorithm achieves significant improvement in the learning speed. In particular, the designed algorithm is used to solve an edge transcoder selection problem in crowdsourced live video streaming systems and shows superior performance as compared to the existing schemes.
引用
收藏
页码:210 / 215
页数:6
相关论文
共 50 条
  • [1] Thompson Sampling for Non-Stationary Bandit Problems
    Qi, Han
    Guo, Fei
    Zhu, Li
    ENTROPY, 2025, 27 (01)
  • [2] Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems
    Jung, Young Hun
    Tewari, Ambuj
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [3] Thompson Sampling for the Multinomial Logit Bandit
    Agrawal, Shipra
    Avadhanula, Vashist
    Goyal, Vineet
    Zeevi, Assaf
    MATHEMATICS OF OPERATIONS RESEARCH, 2025,
  • [4] Thompson Sampling Based Mechanisms for Stochastic Multi-Armed Bandit Problems
    Ghalme, Ganesh
    Jain, Shweta
    Gujar, Sujit
    Narahari, Y.
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 87 - 95
  • [5] Contextual Bandit for Active Learning: Active Thompson Sampling
    Bouneffouf, Djallel
    Laroche, Romain
    Urvoy, Tanguy
    Feraud, Raphael
    Allesiardo, Robin
    NEURAL INFORMATION PROCESSING (ICONIP 2014), PT I, 2014, 8834 : 405 - 412
  • [6] Bandit Algorithms Based on Thompson Sampling for Bounded Reward Distributions
    Riou, Charles
    Honda, Junya
    ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 777 - 826
  • [7] The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models
    Lee, Jongyeong
    Chiang, Chao-Kai
    Sugiyama, Masashi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13383 - 13390
  • [8] A Thompson Sampling Approach to Unifying Causal Inference and Bandit Learning
    Xu, Hanxuan
    Xie, Hong
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT II, 2023, 13936 : 255 - 266
  • [9] A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit
    Yang, Long
    Li, Zhao
    Hu, Zehong
    Ruan, Shasha
    Pan, Gang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 5332 - 5341
  • [10] An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting
    Kalkanli, Cem
    Ozgur, Ayfer
    2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 2783 - 2788