Minimax Off-Policy Evaluation for Multi-Armed Bandits

被引：3

作者：

Ma, Cong ^{[1
]}

Zhu, Banghua ^{[2
]}

Jiao, Jiantao ^{[2
,3
]}

Wainwright, Martin J. ^{[2
,3
]}

机构：

[1] Univ Chicago, Dept Stat, Chicago, IL 60637 USA

[2] Univ Calif Berkeley UC Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA

[3] Univ Calif Berkeley UC Berkeley, Dept Stat, Berkeley, CA 94720 USA

来源：

IEEE TRANSACTIONS ON INFORMATION THEORY | 2022年 / 68卷 / 08期

关键词：

Switches; Probability; Monte Carlo methods; Chebyshev approximation; Measurement; Computational modeling; Sociology; Off-policy evaluation; multi-armed bandits; minimax optimality; importance sampling; POLYNOMIALS;

D O I：

10.1109/TIT.2022.3162335

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards, and develop minimax rate-optimal procedures under three settings. First, when the behavior policy is known, we show that the Switch estimator, a method that alternates between the plug-in and importance sampling estimators, is minimax rate-optimal for all sample sizes. Second, when the behavior policy is unknown, we analyze performance in terms of the competitive ratio, thereby revealing a fundamental gap between the settings of known and unknown behavior policies. When the behavior policy is unknown, any estimator must have mean-squared error larger-relative to the oracle estimator equipped with the knowledge of the behavior policy- by a multiplicative factor proportional to the support size of the target policy. Moreover, we demonstrate that the plug-in approach achieves this worst-case competitive ratio up to a logarithmic factor. Third, we initiate the study of the partial knowledge setting in which it is assumed that the minimum probability taken by the behavior policy is known. We show that the plug-in estimator is optimal for relatively large values of the minimum probability, but is sub-optimal when the minimum probability is low. In order to remedy this gap, we propose a new estimator based on approximation by Chebyshev polynomials that provably achieves the optimal estimation error. Numerical experiments on both simulated and real data corroborate our theoretical findings.

引用

页码：5314 / 5339

页数：26

共 50 条

[1] Trading off Rewards and Errors in Multi-Armed Bandits
Erraqabi, Akram
Lazaric, Alessandro
Valko, Michal
Brunskill, Emma
Liu, Yun-En
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 709 - 717
[2] On Kernelized Multi-armed Bandits
Chowdhury, Sayak Ray
Gopalan, Aditya
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[3] Regional Multi-Armed Bandits
Wang, Zhiyang
Zhou, Ruida
Shen, Cong
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
[4] Multi-armed Bandits with Compensation
Wang, Siwei
Huang, Longbo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[5] Federated Multi-Armed Bandits
Shi, Chengshuai
Shen, Cong
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9603 - 9611
[6] Multi-armed Bandits with Probing
Elumar, Eray Can
Tekin, Cem
Yagan, Osman
2024 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, ISIT 2024, 2024, : 2080 - 2085
[7] Ballooning multi-armed bandits
Ghalme, Ganesh
Dhamal, Swapnil
Jain, Shweta
Gujar, Sujit
Narahari, Y.
ARTIFICIAL INTELLIGENCE, 2021, 296
[8] An empirical evaluation of active inference in multi-armed bandits
Markovic, Dimitrije
Stojic, Hrvoje
Schwoebel, Sarah
Kiebel, Stefan J.
NEURAL NETWORKS, 2021, 144 : 229 - 246
[9] Approximate Function Evaluation via Multi-Armed Bandits
Baharav, Tavor Z.
Cheng, Gary
Pilanci, Mert
Tse, David
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 108 - 135
[10] Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
Wang, Yu-Xiang
Agarwal, Alekh
Dudik, Miroslav
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70

← 1 2 3 4 5 →