Neural Q-Learning Based on Residual Gradient for Nonlinear Control Systems

被引:0
|
作者
Si, Yanna [1 ]
Pu, Jiexin [1 ]
Zang, Shaofei [1 ]
机构
[1] Henan Univ Sci & Technol, Sch Informat Engn, Luoyang, Peoples R China
关键词
Q-learning; feedforward neural network; value function approximation; residual gradient method; nonlinear control systems;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To solve the control problem of nonlinear system under continuous state space, this paper puts forward a neural Q-learning algorithm based on residual gradient method. Firstly, the multi-layer feedforward neural network is utilized to approximate the Q-value function, overcoming the "dimensional disaster" in the classical reinforcement learning. Then based on the residual gradient method, a mini-batch gradient descent is implemented by the experience replay to update the neural network parameters, which can effectively reduce the iterations number and increase the learning speed. Moreover, the momentum optimization method is introduced to ensure the stability of the training process further and improve the convergence. In order to balance exploration and utilization better, epsilon-decreasing strategy replaces epsilon-greedy for action selection. The simulation results of CartPole control task show the correctness and effectiveness of the proposed algorithm.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Balance Control of Robot With CMAC Based Q-learning
    Li Ming-ai
    Jiao Li-fang
    Qiao Jun-fei
    Ruan Xiao-gang
    2008 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-11, 2008, : 2668 - 2672
  • [32] Power Control Algorithm Based on Q-Learning in Femtocell
    Li Y.
    Tang Y.
    Liu H.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2019, 41 (11): : 2557 - 2564
  • [33] Optimal Control Inspired Q-Learning for Switched Linear Systems
    Chen, Hua
    Zheng, Linfang
    Zhang, Wei
    2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 4003 - 4010
  • [34] Q-learning of the storage function in Economic Nonlinear Model Predictive Control
    Kordabad, Arash Bahari
    Gros, Sebastien
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 116
  • [35] A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems
    Wei QingLai
    Liu DeRong
    SCIENCE CHINA-INFORMATION SCIENCES, 2015, 58 (12) : 1 - 15
  • [36] Q-Learning Algorithm Module in Hybrid Artificial Neural Network Systems
    Vitku, Jaroslav
    Nahodil, Pavel
    MODERN TRENDS AND TECHNIQUES IN COMPUTER SCIENCE (CSOC 2014), 2014, 285 : 117 - 127
  • [37] A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems
    WEI QingLai
    LIU DeRong
    ScienceChina(InformationSciences), 2015, 58 (12) : 147 - 161
  • [38] Dynamic neural network control through fuzzy Q-learning algorithms
    Deng, ZD
    Kwok, DP
    1997 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT PROCESSING SYSTEMS, VOLS 1 & 2, 1997, : 381 - 386
  • [39] Neural Q-learning control architectures for a wall-following behavior
    Cicirelli, G
    D'Orazio, T
    Distante, A
    IROS 2003: PROCEEDINGS OF THE 2003 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4, 2003, : 680 - 685
  • [40] Q-learning based dynamic optimal CPS control methodology for interconnected power systems
    College of Electric Power, South China University of Technology, Guangzhou 510640, China
    不详
    Zhongguo Dianji Gongcheng Xuebao, 2009, 19 (13-19):