Thompson Sampling Based Multi-Armed-Bandit Mechanism Using Neural Networks

被引：0

作者：

Manisha, Padala ^{[1
]}

Gujar, Sujit ^{[1
]}

机构：

[1] Int Inst Informat Technol, Hyderabad, Telangana, India

来源：

AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS | 2019年

关键词：

Mechanism Design; MAB; Neural Networks;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In many practical applications such as crowd-sourcing and online advertisement, use of mechanism design (auction-based mechanisms) depends upon inherent stochastic parameters which are unknown. These parameters are learnt using multi-armed bandit (MAB) algorithms. The mechanisms which incorporate MAB are referred to as Multi-Armed-Bandit Mechanisms. While most of the MAB mechanisms focus on frequentist approaches like upper confidence bound algorithms, recent work has shown that using Bayesian approaches like Thompson sampling results in mechanisms with better regret bounds; although lower regret is obtained at the cost of the mechanism ending up with a weaker game theoretic property i.e. Within-Period Dominant Strategy Incentive Compatibility (WP-DSIC). The existing payment rules used in the Thompson sampling based mechanisms may cause negative utility to the auctioneer. In addition, if we wish to minimize the cost to the auctioneer, it is very challenging to design payment rules that satisfy WP-DSIC while learning through Thompson sampling. In our work, we propose a data-driven approach for designing MAB-mechanisms. Specifically, we use neural networks for designing the payment rule which is WP-DSIC, while the allocation rule is modeled using Thompson sampling. Our results, in the setting of crowd-sourcing for recruiting quality workers, indicate that the learned payment rule guarantees better cost while maximizing the social welfare and also ensuring reduced variance in the utilities to the agents.

引用

页码：2111 / 2113

页数：3

共 50 条

[31] Using Multi-Armed Bandit Learning for Thwarting MAC Layer Attacks in Wireless Networks
Dutta, Hrishikesh
Bhuyan, Amit Kumar
Biswas, Subir
IEEE-ACM TRANSACTIONS ON NETWORKING, 2024,
[32] Multi-Armed Bandit for Edge Computing in Dynamic Networks with Uncertainty
Ghoorchian, Saeed
Maghsudi, Setareh
PROCEEDINGS OF THE 21ST IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (IEEE SPAWC2020), 2020,
[33] Learning the Truth by Weakly Connected Agents in Social Networks Using Multi-Armed Bandit
Odeyomi, Olusola Tolulope
IEEE ACCESS, 2020, 8 : 202090 - 202099
[34] A multi-armed bandit approach for exploring partially observed networks
Kaushalya Madhawa
Tsuyoshi Murata
Applied Network Science, 4
[35] A multi-armed bandit approach for exploring partially observed networks
Madhawa, Kaushalya
Murata, Tsuyoshi
APPLIED NETWORK SCIENCE, 2019, 4 (01)
[36] Pruning neural networks using multi-armed bandits
Ameen S.
Vadera S.
Computer Journal, 2020, 63 (07): : 1099 - 1108
[37] Pruning Neural Networks Using Multi-Armed Bandits
Ameen, Salem
Vadera, Sunil
COMPUTER JOURNAL, 2020, 63 (07): : 1099 - 1108
[38] Approximate Thompson Sampling via Epistemic Neural Networks
Osband, Ian
Wen, Zheng
Asghari, Seyed Mohammad
Dwaracherla, Vikranth
Ibrahimi, Morteza
Lu, Xiuyuan
Van Roy, Benjamin
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 1586 - 1595
[39] BandiTS: Dynamic Timing Speculation Using Multi-Armed Bandit Based Optimization
Zhang, Jeff
Garg, Siddharth
PROCEEDINGS OF THE 2017 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2017, : 922 - 925
[40] Multi-User Communication Networks: A Coordinated Multi-Armed Bandit Approach
Avner, Orly
Mannor, Shie
IEEE-ACM TRANSACTIONS ON NETWORKING, 2019, 27 (06) : 2192 - 2207

← 1 2 3 4 5 →