Bandit Convex Optimization in Non-stationary Environments

被引:0
|
作者
Zhao, Peng [1 ]
Wang, Guanghui [1 ]
Zhang, Lijun [1 ]
Zhou, Zhi-Hua [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
基金
美国国家科学基金会;
关键词
TRACKING;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bandit Convex Optimization (BCO) is a fundamental framework for modeling sequential decision-making with partial information, where the only feedback available to the player is the one-point or two-point function values. In this paper, we investigate BCO in non-stationary environments and choose the dynamic regret as the performance measure, which is defined as the difference between the cumulative loss incurred by the algorithm and that of any feasible comparator sequence. Let T be the time horizon and P-T be the path-length of the comparator sequence that reflects the non-stationarity of environments. We propose a novel algorithm that achieves O(T-3/4(1 + P-T)(1/2)) and O(T-1/2(1 + P-T)(1/2)) dynamic regret respectively for the one-point and two-point feedback models. The latter result is optimal, matching the Omega(T-1/2 (1 + P-T)(1/2)) lower bound established in this paper. Notably, our algorithm is more adaptive to non-stationary environments since it does not require prior knowledge of the path-length P-T ahead of time, which is generally unknown.
引用
收藏
页码:1508 / 1517
页数:10
相关论文
共 50 条
  • [1] Bandit Convex Optimization in Non-stationary Environments
    Zhao, Peng
    Wang, Guanghui
    Zhang, Lijun
    Zhou, Zhi-Hua
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [2] Evolutionary Multiobjective Optimization in Non-Stationary Environments
    Aragon, Victoria
    Esquivel, Susana
    Coello Coello, Carlos A.
    JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2005, 5 (03): : 133 - 143
  • [3] Particle swarm optimization in non-stationary environments
    Esquivel, SC
    Coello, CAC
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2004, 2004, 3315 : 757 - 766
  • [4] Existence in undiscounted non-stationary non-convex multisector environments
    Joshi, S
    JOURNAL OF MATHEMATICAL ECONOMICS, 1997, 28 (01) : 111 - 126
  • [5] An Optimal Algorithm for Adversarial Bandit Problem with Multiple Plays in Non-Stationary Environments
    Vural, N. Mert
    Ozturk, Bugra
    Kozat, Suleyman S.
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [6] Thompson Sampling for Non-Stationary Bandit Problems
    Qi, Han
    Guo, Fei
    Zhu, Li
    ENTROPY, 2025, 27 (01)
  • [7] Contextual Multi-Armed Bandit With Costly Feature Observation in Non-Stationary Environments
    Ghoorchian, Saeed
    Kortukov, Evgenii
    Maghsudi, Setareh
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 820 - 830
  • [8] LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments
    de Curto, J.
    de Zarza, I.
    Roig, Gemma
    Cano, Juan Carlos
    Manzoni, Pietro
    Calafate, Carlos T.
    ELECTRONICS, 2023, 12 (13)
  • [9] Evolutionary Optimization of Control Strategies for Non-Stationary Immersion Environments
    Musaev, Alexander
    Makshanov, Andrey
    Grigoriev, Dmitry
    MATHEMATICS, 2022, 10 (11)
  • [10] A Multi-armed Bandit Algorithm Available in Stationary or Non-stationary Environments Using Self-organizing Maps
    Manome, Nobuhito
    Shinohara, Shuji
    Suzuki, Kouta
    Tomonaga, Kosuke
    Mitsuyoshi, Shunji
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I, 2019, 11727 : 529 - 540