Predictive reinforcement learning in non-stationary environments using weighted mixture policy

被引：0

作者：

Pourshamsaei, Hossein ^{[1
]}

Nobakhti, Amin ^{[1
]}

机构：

[1] Sharif Univ Technol, Dept Elect Engn, Azadi Ave, Tehran 111554363, Iran

来源：

APPLIED SOFT COMPUTING | 2024年 / 153卷

关键词：

Reinforcement learning; Non-stationary environments; Adaptive learning rate; Mixture policy; Predictive reference tracking; MODEL;

D O I：

10.1016/j.asoc.2024.111305

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement Learning (RL) within non-stationary environments presents a formidable challenge. In some applications, anticipating abrupt alterations in the environment model might be possible. The existing literature lacks a framework that proactively harnesses such predictions to enhance reward optimization. This paper introduces an innovative methodology designed to preemptively leverage these predictions, thereby maximizing the overall achieved performance. This is executed by formulating a novel approach that generates a weighted mixture policy from both the optimal policies of the prevailing and forthcoming models. To ensure safe learning, an adaptive learning rate is derived to facilitate training of the weighted mixture policy. This theoretically guarantees monotonic performance improvement at each update during training. Empirical trials focus on a model-free predictive reference tracking scenario involving piecewise constant references. Through the utilization of the cart-pole position control problem, it is demonstrated that the proposed algorithm surpasses prior techniques such as context Q-learning and RL with context detection algorithms in nonstationary environments. Moreover, the algorithm outperforms the application of individual optimal policies derived from each observed environment model (i.e., policies not utilizing predictions).

引用

页数：16

共 50 条

[41] Behavior-Aware Approach for Deep Reinforcement Learning in Non-stationary Environments without Known Change Points
Liu, Zihe
Lu, Jie
Zhang, Guangquan
Xuan, Junyu
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4634 - 4642
[42] Deep reinforcement learning control for non-stationary building energy management
Naug, Avisek
Quinones-Grueiro, Marcos
Biswas, Gautam
ENERGY AND BUILDINGS, 2022, 277
[43] On-line non-stationary ICA using mixture models
Ahmed, A
Andrieu, C
Doucet, A
Rayner, PJW
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 3148 - 3151
[44] Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand
Dehaybe, Henri
Catanzaro, Daniele
Chevalier, Philippe
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2024, 314 (02) : 433 - 445
[45] Detection and estimation in non-stationary environments
Toolan, TM
Tufts, DW
CONFERENCE RECORD OF THE THIRTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 2003, : 797 - 801
[46] Adaptive beamforming in non-stationary environments
Cox, H
THIRTY-SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS - CONFERENCE RECORD, VOLS 1 AND 2, CONFERENCE RECORD, 2002, : 431 - 438
[47] Rewiring Neurons in Non-Stationary Environments
Sun, Zhicheng
Mu, Yadong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[48] FLOODING RISK ASSESSMENT IN STATIONARY AND NON-STATIONARY ENVIRONMENTS
Thomson, Rhys
Drynan, Leo
Ball, James
Veldema, Ailsa
Phillips, Brett
Babister, Mark
PROCEEDINGS OF THE 36TH IAHR WORLD CONGRESS: DELTAS OF THE FUTURE AND WHAT HAPPENS UPSTREAM, 2015, : 5167 - 5177
[49] Meta-learning optimal parameter values in non-stationary environments
Sikora, Riyaz T.
KNOWLEDGE-BASED SYSTEMS, 2008, 21 (08) : 800 - 806
[50] Incremental learning with ensemble based SVM classifiers for non-stationary environments
Yalcin, Aycan
Erdem, Zeki
Guergen, Fikret
2007 IEEE 15TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1-3, 2007, : 1208 - 1211

← 1 2 3 4 5 →