Predictive reinforcement learning in non-stationary environments using weighted mixture policy

被引:0
|
作者
Pourshamsaei, Hossein [1 ]
Nobakhti, Amin [1 ]
机构
[1] Sharif Univ Technol, Dept Elect Engn, Azadi Ave, Tehran 111554363, Iran
关键词
Reinforcement learning; Non-stationary environments; Adaptive learning rate; Mixture policy; Predictive reference tracking; MODEL;
D O I
10.1016/j.asoc.2024.111305
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement Learning (RL) within non-stationary environments presents a formidable challenge. In some applications, anticipating abrupt alterations in the environment model might be possible. The existing literature lacks a framework that proactively harnesses such predictions to enhance reward optimization. This paper introduces an innovative methodology designed to preemptively leverage these predictions, thereby maximizing the overall achieved performance. This is executed by formulating a novel approach that generates a weighted mixture policy from both the optimal policies of the prevailing and forthcoming models. To ensure safe learning, an adaptive learning rate is derived to facilitate training of the weighted mixture policy. This theoretically guarantees monotonic performance improvement at each update during training. Empirical trials focus on a model-free predictive reference tracking scenario involving piecewise constant references. Through the utilization of the cart-pole position control problem, it is demonstrated that the proposed algorithm surpasses prior techniques such as context Q-learning and RL with context detection algorithms in nonstationary environments. Moreover, the algorithm outperforms the application of individual optimal policies derived from each observed environment model (i.e., policies not utilizing predictions).
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Learning spectrum opportunities in non-stationary radio environments
    Oksanen, Jan
    Koivunen, Visa
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2447 - 2451
  • [32] Adaptive and on-line learning in non-stationary environments
    Lughofer, Edwin
    Sayed-Mouchaweh, Moamar
    EVOLVING SYSTEMS, 2015, 6 (02) : 75 - 77
  • [33] Improved Selection of Auxiliary Objectives using Reinforcement Learning in Non-Stationary Environment
    Petrova, Irina
    Buzdalova, Arina
    Buzdalov, Maxim
    2014 13TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2014, : 580 - 583
  • [34] Non-stationary noise estimation using dictionary learning and Gaussian mixture models
    Hughes, James M.
    Rockmore, Daniel N.
    Wang, Yang
    IMAGE PROCESSING: ALGORITHMS AND SYSTEMS XII, 2014, 9019
  • [35] Multi-Source Transfer Learning for Non-Stationary Environments
    Du, Honghui
    Minku, Leandro L.
    Zhou, Huiyu
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [36] Learning Latent and Changing Dynamics in Real Non-Stationary Environments
    Liu, Zihe
    Lu, Jie
    Xuan, Junyu
    Zhang, Guangquan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (04) : 1930 - 1942
  • [37] Reliable Localized On-line Learning in Non-stationary Environments
    Buschermoehle, Andreas
    Brockmann, Werner
    2014 IEEE CONFERENCE ON EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS (EAIS), 2014,
  • [38] Adaptive Learning With Extreme Verification Latency in Non-Stationary Environments
    Idrees, Mobin M. M.
    Stahl, Frederic
    Badii, Atta
    IEEE ACCESS, 2022, 10 : 127345 - 127364
  • [39] Reinforcement Learning in Non-Stationary Environments: An Intrinsically Motivated Stress Based Memory Retrieval Performance (SBMRP) Model
    Tang, Tiong Yew
    Egerton, Simon
    Kubota, Naoyuki
    2014 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2014, : 1728 - 1735
  • [40] P-MARL: Prediction-Based Multi-Agent Reinforcement Learning for Non-Stationary Environments
    Marinescu, Andrei
    Dusparic, Ivana
    Taylor, Adam
    Cahill, Vinny
    Clarke, Siobhan
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS (AAMAS'15), 2015, : 1897 - 1898