Recurrent SubmodularWelfare and Matroid Blocking Semi-Bandits

被引：0

作者：

Papadigenopoulos, Orestis ^{[1
]}

Caramanis, Constantine ^{[2
]}

机构：

[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA

[2] Univ Texas Austin, Elect & Comp Engn, Austin, TX 78712 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

COMPLEXITY;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A recent line of research focuses on the study of stochastic multi-armed bandits (MAB), in the case where temporal correlations of specific structure are imposed between the player's actions and the reward distributions of the arms. These correlations lead to (sub-)optimal solutions that exhibit interesting dynamical patterns - a phenomenon that yields new challenges both from an algorithmic as well as a learning perspective. In this work, we extend the above direction to a combinatorial semi-bandit setting and study a variant of stochastic MAB, where arms are subject to matroid constraints and each arm becomes unavailable (blocked) for a fixed number of rounds after each play. A natural common generalization of the state-of-the-art for blocking bandits, and that for matroid bandits, only guarantees a 1/2-approximation for general matroids. In this paper we develop the novel technique of correlated (interleaved) scheduling, which allows us to obtain a polynomial-time (1 - (1)/(e))-approximation algorithm (asymptotically and in expectation) for any matroid. Along the way, we discover an interesting connection to a variant of Submodular Welfare Maximization, for which we provide (asymptotically) matching upper and lower approximability bounds. In the case where the mean arm rewards are unknown, our technique naturally decouples the scheduling from the learning problem, and thus allows to control the (1 - (1)/(e))-approximate regret of a UCB-based adaptation of our online algorithm.

引用

页数：13

共 50 条

[41] Semi-parametric contextual bandits with graph-Laplacian regularization
Choi, Young-Geun
Kim, Gi-Soo
Paik, Seunghoon
Paik, Myunghee Cho
INFORMATION SCIENCES, 2023, 645
[42] Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback
Jourdan, Marc
Mutny, Mojmir
Kirschner, Johannes
Krause, Andreas
ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132
[43] Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits
Ramesh, Aditya
Rauber, Paulo
Conserva, Michelangelo
Schmidhuber, Juergen
NEURAL COMPUTATION, 2022, 34 (11) : 2232 - 2272
[44] Contextual Bandits with Delayed Feedback and Semi-supervised Learning (Student Abstract)
Yang, Luting
Yang, Jianyi
Ren, Shaolei
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15943 - 15944
[45] Education in the Valley of Pajeu - Bandits and educational system in the semi-arid regions of Brazil
Silva, PS
SOCIAL SCIENCE INFORMATION SUR LES SCIENCES SOCIALES, 2001, 40 (04): : 607 - 625
[46] Recurrent calculation of blocking probability in multiservice switching networks
Glabowski, Mariusz
2006 ASIA-PACIFIC CONFERENCE ON COMMUNICATION, VOLS 1 AND 2, 2006, : 297 - 301
[47] MATERNAL BLOCKING ANTIBODIES, THE FETAL ALLOGRAFT, AND RECURRENT ABORTION
不详
LANCET, 1983, 2 (8360): : 1175 - 1176
[48] The complexity of blocking (semi)total dominating sets with edge contractions
Galby, Esther
THEORETICAL COMPUTER SCIENCE, 2023, 950
[49] Impedance spectroscopy of mixed conductors with semi-blocking boundaries
Jamnik, J
SOLID STATE IONICS, 2003, 157 (1-4) : 19 - 28
[50] AN AUTOMATIC PROCEDURE FOR THE SYMMETRY BLOCKING OF SEMI-EMPIRICAL HAMILTONIANS
HEAD, JD
BLYHOLDER, G
RUETTE, F
JOURNAL OF COMPUTATIONAL PHYSICS, 1982, 45 (02) : 255 - 265

← 1 2 3 4 5 →