Thompson Sampling Based Multi-Armed-Bandit Mechanism Using Neural Networks

被引：0

作者：

Manisha, Padala ^{[1
]}

Gujar, Sujit ^{[1
]}

机构：

[1] Int Inst Informat Technol, Hyderabad, Telangana, India

来源：

AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS | 2019年

关键词：

Mechanism Design; MAB; Neural Networks;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In many practical applications such as crowd-sourcing and online advertisement, use of mechanism design (auction-based mechanisms) depends upon inherent stochastic parameters which are unknown. These parameters are learnt using multi-armed bandit (MAB) algorithms. The mechanisms which incorporate MAB are referred to as Multi-Armed-Bandit Mechanisms. While most of the MAB mechanisms focus on frequentist approaches like upper confidence bound algorithms, recent work has shown that using Bayesian approaches like Thompson sampling results in mechanisms with better regret bounds; although lower regret is obtained at the cost of the mechanism ending up with a weaker game theoretic property i.e. Within-Period Dominant Strategy Incentive Compatibility (WP-DSIC). The existing payment rules used in the Thompson sampling based mechanisms may cause negative utility to the auctioneer. In addition, if we wish to minimize the cost to the auctioneer, it is very challenging to design payment rules that satisfy WP-DSIC while learning through Thompson sampling. In our work, we propose a data-driven approach for designing MAB-mechanisms. Specifically, we use neural networks for designing the payment rule which is WP-DSIC, while the allocation rule is modeled using Thompson sampling. Our results, in the setting of crowd-sourcing for recruiting quality workers, indicate that the learned payment rule guarantees better cost while maximizing the social welfare and also ensuring reduced variance in the utilities to the agents.

引用

页码：2111 / 2113

页数：3

共 50 条

[41] Performance of TVWS-based LoRa Transmissions using Multi-Armed Bandit
Askhedkar, Anjali R.
Chaudhari, Bharat S.
Saeed, Rashid A.
Alhumyani, Hesham
Alenizi, Abdullah
INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2024, 15 (09) : 759 - 769
[42] Prioritized Experience Replay based on Multi-armed Bandit
Liu, Ximing
Zhu, Tianqing
Jiang, Cuiqing
Ye, Dayong
Zhao, Fuqing
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 189
[43] Motion Planning as Online Learning: A Multi-Armed Bandit Approach to Kinodynamic Sampling-Based Planning
Faroni, Marco
Berenson, Dmitry
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10): : 6651 - 6658
[44] Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits
Park, Hongju
Faradonbeh, Mohamad Kazem Shirani
IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2150 - 2155
[45] A quality assuring, cost optimal multi-armed bandit mechanism for expertsourcing
Jain, Shweta
Gujar, Sujit
Bhat, Satyanath
Zoeter, Onno
Narahari, Y.
ARTIFICIAL INTELLIGENCE, 2018, 254 : 44 - 63
[46] Gateway Selection in Millimeter Wave UAV Wireless Networks Using Multi-Player Multi-Armed Bandit
Mohamed, Ehab Mahmoud
Hashima, Sherief
Aldosary, Abdallah
Hatano, Kohei
Abdelghany, Mahmoud Ahmed
SENSORS, 2020, 20 (14) : 1 - 22
[47] Multi-Armed Bandit-Based Secure Routing in Air-Ground Integrated Networks
Liu, Xiaoyuan
Xu, Yang
Liu, Jia
Takakura, Hiroki
Liu, Xiaoying
Zheng, Kechen
Shiratori, Norio
2024 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC 2024, 2024,
[48] Active Learning on Heterogeneous Information Networks: A Multi-armed Bandit Approach
Xin, Doris
El-Kishky, Ahmed
Liao, De
Norick, Brandon
Han, Jiawei
2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 1350 - 1355
[49] A Multi-armed Bandit Approach to Distributed Robust Beamforming in Multicell Networks
Zhang, Xinruo
Nakhai, Mohammad Reza
Ariffin, Wan Nur Suryani Firuz Wan
2016 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2016,
[50] Autonomous Resource Allocation for dense LTE networks: A Multi Armed Bandit formulation
Feki, Afef
Capdevielle, Veronique
2011 IEEE 22ND INTERNATIONAL SYMPOSIUM ON PERSONAL INDOOR AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2011, : 66 - 70

← 1 2 3 4 5 →