Bandit Convex Optimization in Non-stationary Environments

被引：0

作者：

Zhao, Peng ^{[1
]}

Wang, Guanghui ^{[1
]}

Zhang, Lijun ^{[1
]}

Zhou, Zhi-Hua ^{[1
]}

机构：

[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108 | 2020年 / 108卷

基金：

美国国家科学基金会;

关键词：

TRACKING;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Bandit Convex Optimization (BCO) is a fundamental framework for modeling sequential decision-making with partial information, where the only feedback available to the player is the one-point or two-point function values. In this paper, we investigate BCO in non-stationary environments and choose the dynamic regret as the performance measure, which is defined as the difference between the cumulative loss incurred by the algorithm and that of any feasible comparator sequence. Let T be the time horizon and P-T be the path-length of the comparator sequence that reflects the non-stationarity of environments. We propose a novel algorithm that achieves O(T-3/4(1 + P-T)(1/2)) and O(T-1/2(1 + P-T)(1/2)) dynamic regret respectively for the one-point and two-point feedback models. The latter result is optimal, matching the Omega(T-1/2 (1 + P-T)(1/2)) lower bound established in this paper. Notably, our algorithm is more adaptive to non-stationary environments since it does not require prior knowledge of the path-length P-T ahead of time, which is generally unknown.

引用

页码：1508 / 1517

页数：10

共 50 条

[1] Bandit Convex Optimization in Non-stationary Environments
Zhao, Peng
Wang, Guanghui
Zhang, Lijun
Zhou, Zhi-Hua
JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
[2] Evolutionary Multiobjective Optimization in Non-Stationary Environments
Aragon, Victoria
Esquivel, Susana
Coello Coello, Carlos A.
JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2005, 5 (03): : 133 - 143
[3] Particle swarm optimization in non-stationary environments
Esquivel, SC
Coello, CAC
ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2004, 2004, 3315 : 757 - 766
[4] Existence in undiscounted non-stationary non-convex multisector environments
Joshi, S
JOURNAL OF MATHEMATICAL ECONOMICS, 1997, 28 (01) : 111 - 126
[5] An Optimal Algorithm for Adversarial Bandit Problem with Multiple Plays in Non-Stationary Environments
Vural, N. Mert
Ozturk, Bugra
Kozat, Suleyman S.
2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
[6] Thompson Sampling for Non-Stationary Bandit Problems
Qi, Han
Guo, Fei
Zhu, Li
ENTROPY, 2025, 27 (01)
[7] Contextual Multi-Armed Bandit With Costly Feature Observation in Non-Stationary Environments
Ghoorchian, Saeed
Kortukov, Evgenii
Maghsudi, Setareh
IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 820 - 830
[8] LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments
de Curto, J.
de Zarza, I.
Roig, Gemma
Cano, Juan Carlos
Manzoni, Pietro
Calafate, Carlos T.
ELECTRONICS, 2023, 12 (13)
[9] Evolutionary Optimization of Control Strategies for Non-Stationary Immersion Environments
Musaev, Alexander
Makshanov, Andrey
Grigoriev, Dmitry
MATHEMATICS, 2022, 10 (11)
[10] A Multi-armed Bandit Algorithm Available in Stationary or Non-stationary Environments Using Self-organizing Maps
Manome, Nobuhito
Shinohara, Shuji
Suzuki, Kouta
Tomonaga, Kosuke
Mitsuyoshi, Shunji
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I, 2019, 11727 : 529 - 540

← 1 2 3 4 5 →