Predictive reinforcement learning in non-stationary environments using weighted mixture policy

被引:0
|
作者
Pourshamsaei, Hossein [1 ]
Nobakhti, Amin [1 ]
机构
[1] Sharif Univ Technol, Dept Elect Engn, Azadi Ave, Tehran 111554363, Iran
关键词
Reinforcement learning; Non-stationary environments; Adaptive learning rate; Mixture policy; Predictive reference tracking; MODEL;
D O I
10.1016/j.asoc.2024.111305
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement Learning (RL) within non-stationary environments presents a formidable challenge. In some applications, anticipating abrupt alterations in the environment model might be possible. The existing literature lacks a framework that proactively harnesses such predictions to enhance reward optimization. This paper introduces an innovative methodology designed to preemptively leverage these predictions, thereby maximizing the overall achieved performance. This is executed by formulating a novel approach that generates a weighted mixture policy from both the optimal policies of the prevailing and forthcoming models. To ensure safe learning, an adaptive learning rate is derived to facilitate training of the weighted mixture policy. This theoretically guarantees monotonic performance improvement at each update during training. Empirical trials focus on a model-free predictive reference tracking scenario involving piecewise constant references. Through the utilization of the cart-pole position control problem, it is demonstrated that the proposed algorithm surpasses prior techniques such as context Q-learning and RL with context detection algorithms in nonstationary environments. Moreover, the algorithm outperforms the application of individual optimal policies derived from each observed environment model (i.e., policies not utilizing predictions).
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Behavior-Aware Approach for Deep Reinforcement Learning in Non-stationary Environments without Known Change Points
    Liu, Zihe
    Lu, Jie
    Zhang, Guangquan
    Xuan, Junyu
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4634 - 4642
  • [42] Deep reinforcement learning control for non-stationary building energy management
    Naug, Avisek
    Quinones-Grueiro, Marcos
    Biswas, Gautam
    ENERGY AND BUILDINGS, 2022, 277
  • [43] On-line non-stationary ICA using mixture models
    Ahmed, A
    Andrieu, C
    Doucet, A
    Rayner, PJW
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 3148 - 3151
  • [44] Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand
    Dehaybe, Henri
    Catanzaro, Daniele
    Chevalier, Philippe
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2024, 314 (02) : 433 - 445
  • [45] Detection and estimation in non-stationary environments
    Toolan, TM
    Tufts, DW
    CONFERENCE RECORD OF THE THIRTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 2003, : 797 - 801
  • [46] Adaptive beamforming in non-stationary environments
    Cox, H
    THIRTY-SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS - CONFERENCE RECORD, VOLS 1 AND 2, CONFERENCE RECORD, 2002, : 431 - 438
  • [47] Rewiring Neurons in Non-Stationary Environments
    Sun, Zhicheng
    Mu, Yadong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [48] FLOODING RISK ASSESSMENT IN STATIONARY AND NON-STATIONARY ENVIRONMENTS
    Thomson, Rhys
    Drynan, Leo
    Ball, James
    Veldema, Ailsa
    Phillips, Brett
    Babister, Mark
    PROCEEDINGS OF THE 36TH IAHR WORLD CONGRESS: DELTAS OF THE FUTURE AND WHAT HAPPENS UPSTREAM, 2015, : 5167 - 5177
  • [49] Meta-learning optimal parameter values in non-stationary environments
    Sikora, Riyaz T.
    KNOWLEDGE-BASED SYSTEMS, 2008, 21 (08) : 800 - 806
  • [50] Incremental learning with ensemble based SVM classifiers for non-stationary environments
    Yalcin, Aycan
    Erdem, Zeki
    Guergen, Fikret
    2007 IEEE 15TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1-3, 2007, : 1208 - 1211