Mean Field Equilibrium in Multi-Armed Bandit Game with Continuous Reward

被引：0

作者：

Wang, Xiong ^{[1
]}

Jia, Riheng ^{[2
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[2] Zhejiang Normal Univ, Jinhua, Peoples R China

来源：

PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021 | 2021年

关键词：

DYNAMIC-GAMES;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Mean field game facilitates analyzing multi-armed bandit (MAB) for a large number of agents by approximating their interactions with an average effect. Existing mean field models for multi-agent MAB mostly assume a binary reward function, which leads to tractable analysis but is usually not applicable in practical scenarios. In this paper, we study the mean field bandit game with a continuous reward function. Specifically, we focus on deriving the existence and uniqueness of mean field equilibrium (MFE), thereby guaranteeing the asymptotic stability of the multi-agent system. To accommodate the continuous reward function, we encode the learned reward into an agent state, which is in turn mapped to its stochastic arm playing policy and updated using realized observations. We show that the state evolution is upper semi-continuous, based on which the existence of MFE is obtained. As the Markov analysis is mainly for the case of discrete state, we transform the stochastic continuous state evolution into a deterministic ordinary differential equation (ODE). On this basis, we can characterize a contraction mapping for the ODE to ensure a unique MFE for the bandit game. Extensive evaluations validate our MFE characterization, and exhibit tight empirical regret of the MAB problem.

引用

页码：3118 / 3124

页数：7

共 50 条

[1] Noise Free Multi-armed Bandit Game
Nakamura, Atsuyoshi
Helmbold, David P.
Warmuth, Manfred K.
LANGUAGE AND AUTOMATA THEORY AND APPLICATIONS, LATA 2016, 2016, 9618 : 412 - 423
[2] Combinatorial Multi-Armed Bandit with General Reward Functions
Chen, Wei
Hu, Wei
Li, Fu
Li, Jian
Liu, Yu
Lu, Pinyan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[3] Possibilistic reward methods for the multi-armed bandit problem
Martin, Miguel
Jimenez-Martin, Antonio
Mateos, Alfonso
NEUROCOMPUTING, 2018, 310 : 201 - 212
[4] Multi-armed bandit approach for mean field game-based resource allocation in NOMA networks
Benamor, Amani
Habachi, Oussama
Kammoun, Ines
Cances, Jean-Pierre
EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2024, 2024 (01)
[5] Existence and uniqueness of mean field equilibrium in continuous bandit game
Wang, Xiong
Li, Yuqing
Jia, Riheng
SCIENCE CHINA-INFORMATION SCIENCES, 2025, 68 (03)
[6] Existence and uniqueness of mean field equilibrium in continuous bandit game
Xiong WANG
Yuqing LI
Riheng JIA
Science China(Information Sciences), 2025, 68 (03) : 395 - 396
[7] Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model
Kim, Gi-Soo
Paik, Myunghee Cho
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[8] Budgeted Multi-Armed Bandit in Continuous Action Space
Trovo, Francesco
Paladino, Stefano
Restelli, Marcello
Gatti, Nicola
ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 560 - 568
[9] Addictive Games: Case Study on Multi-Armed Bandit Game
Kang, Xiaohan
Ri, Hong
Khalid, Mohd Nor Akmal
Iida, Hiroyuki
INFORMATION, 2021, 12 (12)
[10] The multi-armed bandit, with constraints
Eric V. Denardo
Eugene A. Feinberg
Uriel G. Rothblum
Annals of Operations Research, 2013, 208 : 37 - 62

← 1 2 3 4 5 →