Globally Informative Thompson Sampling for Structured Bandit Problems with Application to CrowdTranscoding

被引:0
|
作者
Liu, Xingchi [1 ]
Derakhshani, Mahsa [1 ]
Zhu, Ziming [2 ]
Lambotharan, Sangarapillai [1 ]
机构
[1] Loughborough Univ, Wolfson Sch Mech Elect & Mfg Engn, Signal Proc & Networks Res Grp, Loughborough LE11 3TU, Leics, England
[2] Toshiba Europe Ltd, Bristol Res & Innovat Lab, Bristol BS1 4ND, Avon, England
来源
3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021) | 2021年
基金
英国工程与自然科学研究理事会;
关键词
Multi-armed bandit; Thompson sampling; Structured bandit; Edge computing; MULTIARMED BANDIT;
D O I
10.1109/ICAIIC51459.2021.9415255
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-armed bandit is a widely-studied model for sequential decision-making problems. The most studied model in the literature is stochastic bandits wherein the reward of each arm follows an independent distribution. However, there is a wide range of applications where the rewards of different alternatives are correlated to some extent. In this paper, a class of structured bandit problems is studied in which rewards of different arms are functions of the same unknown parameter vector. To minimize the cumulative learning regret, we propose a globally-informative Thompson sampling algorithm to learn and leverage the correlation among arms, which can deal with unknown multidimensional parameter and non-monotonic reward functions. Our studies demonstrate that the proposed algorithm achieves significant improvement in the learning speed. In particular, the designed algorithm is used to solve an edge transcoder selection problem in crowdsourced live video streaming systems and shows superior performance as compared to the existing schemes.
引用
收藏
页码:210 / 215
页数:6
相关论文
共 50 条