Predictive reinforcement learning in non-stationary environments using weighted mixture policy

被引:0
|
作者
Pourshamsaei, Hossein [1 ]
Nobakhti, Amin [1 ]
机构
[1] Sharif Univ Technol, Dept Elect Engn, Azadi Ave, Tehran 111554363, Iran
关键词
Reinforcement learning; Non-stationary environments; Adaptive learning rate; Mixture policy; Predictive reference tracking; MODEL;
D O I
10.1016/j.asoc.2024.111305
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement Learning (RL) within non-stationary environments presents a formidable challenge. In some applications, anticipating abrupt alterations in the environment model might be possible. The existing literature lacks a framework that proactively harnesses such predictions to enhance reward optimization. This paper introduces an innovative methodology designed to preemptively leverage these predictions, thereby maximizing the overall achieved performance. This is executed by formulating a novel approach that generates a weighted mixture policy from both the optimal policies of the prevailing and forthcoming models. To ensure safe learning, an adaptive learning rate is derived to facilitate training of the weighted mixture policy. This theoretically guarantees monotonic performance improvement at each update during training. Empirical trials focus on a model-free predictive reference tracking scenario involving piecewise constant references. Through the utilization of the cart-pole position control problem, it is demonstrated that the proposed algorithm surpasses prior techniques such as context Q-learning and RL with context detection algorithms in nonstationary environments. Moreover, the algorithm outperforms the application of individual optimal policies derived from each observed environment model (i.e., policies not utilizing predictions).
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Learning to negotiate optimally in non-stationary environments
    Narayanan, Vidya
    Jennings, Nicholas R.
    COOPERATIVE INFORMATION AGENTS X, PROCEEDINGS, 2006, 4149 : 288 - 300
  • [22] Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments
    Chen, Liyu
    Luo, Haipeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [23] Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning
    Emami, Patrick
    Zhang, Xiangyu
    Biagioni, David
    Zamzam, Ahmed S.
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 2372 - 2378
  • [24] Cache Policy Design via Reinforcement Learning for Cellular Networks in Non-Stationary Environment
    Srinivasan, Ashvin
    Amidzadeh, Mohsen
    Zhang, Junshan
    Tirkkonen, Olav
    2023 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS, ICC WORKSHOPS, 2023, : 764 - 769
  • [25] Choosing search heuristics by non-stationary reinforcement learning
    Nareyek, A
    METAHEURISTICS: COMPUTER DECISION-MAKING, 2004, 86 : 523 - +
  • [26] Double Meta-Learning for Data Efficient Policy Optimization in Non-Stationary Environments
    Aghapour, Elahe
    Ayanian, Nora
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 9935 - 9942
  • [27] Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments
    Marinescu, Andrei
    Dusparic, Ivana
    Clarke, Siobhan
    ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2017, 12 (02)
  • [28] A robust incremental learning method for non-stationary environments
    Martinez-Rego, David
    Perez-Sanchez, Beatriz
    Fontenla-Romero, Oscar
    Alonso-Betanzos, Amparo
    NEUROCOMPUTING, 2011, 74 (11) : 1800 - 1808
  • [29] Learning Optimal Behavior in Environments with Non-stationary Observations
    Boone, Ilio
    Rens, Gavin
    ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 3, 2022, : 729 - 736
  • [30] A heterogeneous online learning ensemble for non-stationary environments
    Idrees, Mobin M.
    Minku, Leandro L.
    Stahl, Frederic
    Badii, Atta
    KNOWLEDGE-BASED SYSTEMS, 2020, 188