Thompson Sampling Based Multi-Armed-Bandit Mechanism Using Neural Networks

被引:0
|
作者
Manisha, Padala [1 ]
Gujar, Sujit [1 ]
机构
[1] Int Inst Informat Technol, Hyderabad, Telangana, India
关键词
Mechanism Design; MAB; Neural Networks;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In many practical applications such as crowd-sourcing and online advertisement, use of mechanism design (auction-based mechanisms) depends upon inherent stochastic parameters which are unknown. These parameters are learnt using multi-armed bandit (MAB) algorithms. The mechanisms which incorporate MAB are referred to as Multi-Armed-Bandit Mechanisms. While most of the MAB mechanisms focus on frequentist approaches like upper confidence bound algorithms, recent work has shown that using Bayesian approaches like Thompson sampling results in mechanisms with better regret bounds; although lower regret is obtained at the cost of the mechanism ending up with a weaker game theoretic property i.e. Within-Period Dominant Strategy Incentive Compatibility (WP-DSIC). The existing payment rules used in the Thompson sampling based mechanisms may cause negative utility to the auctioneer. In addition, if we wish to minimize the cost to the auctioneer, it is very challenging to design payment rules that satisfy WP-DSIC while learning through Thompson sampling. In our work, we propose a data-driven approach for designing MAB-mechanisms. Specifically, we use neural networks for designing the payment rule which is WP-DSIC, while the allocation rule is modeled using Thompson sampling. Our results, in the setting of crowd-sourcing for recruiting quality workers, indicate that the learned payment rule guarantees better cost while maximizing the social welfare and also ensuring reduced variance in the utilities to the agents.
引用
收藏
页码:2111 / 2113
页数:3
相关论文
共 50 条
  • [1] Thompson Sampling Based Mechanisms for Stochastic Multi-Armed Bandit Problems
    Ghalme, Ganesh
    Jain, Shweta
    Gujar, Sujit
    Narahari, Y.
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 87 - 95
  • [2] Multi-Armed-Bandit Based Channel Selection Algorithm for Massive Heterogeneous Internet of Things Networks
    Hasegawa, So
    Kitagawa, Ryoma
    Li, Aohan
    Kim, Song-Ju
    Watanabe, Yoshito
    Shoji, Yozo
    Hasegawa, Mikio
    APPLIED SCIENCES-BASEL, 2022, 12 (15):
  • [3] Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards
    Besbes, Omar
    Gur, Yonatan
    Zeevi, Assaf
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [4] Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms
    Huyuk, Alihan
    Tekin, Cem
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [5] Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit
    Nakamura, Shintaro
    Sugiyama, Masashi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14414 - 14421
  • [6] PPAR: A Privacy-Preserving Adaptive Ranking Algorithm for Multi-Armed-Bandit Crowdsourcing
    Chen, Shuzhen
    Yu, Dongxiao
    Li, Feng
    Zou, Zongrui
    Liang, Weifa
    Cheng, Xiuzhen
    2022 IEEE/ACM 30TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2022,
  • [7] Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays
    Komiyama, Junpei
    Honda, Junya
    Nakagawa, Hiroshi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1152 - 1161
  • [8] UCBEE: A Multi Armed Bandit Approach for Early-Exit in Neural Networks
    Pacheco, Roberto G.
    Bajpai, Divya J.
    Shifrin, Mark
    Couto, Rodrigo S.
    Menasche, Daniel Sadoc
    Hanawal, Manjesh K.
    Campista, Miguel Elias M.
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2025, 22 (01): : 107 - 120
  • [9] Differential Privacy in Social Networks Using Multi-Armed Bandit
    Odeyomi, Olusola T.
    IEEE ACCESS, 2022, 10 : 11817 - 11829
  • [10] Learning the Truth in Social Networks Using Multi-Armed Bandit
    Odeyomi, Olusola T.
    IEEE ACCESS, 2020, 8 : 137692 - 137701