A GRADIENT DESCENT SARSA(λ) ALGORITHM BASED ON THE ADAPTIVE REWARD-SHAPING MECHANISM

被引:0
|
作者
Liu, Quan [1 ]
Fu, QiMing [1 ]
Xiao, Fei [1 ]
Fu, YuChen [1 ]
机构
[1] Soochow Univ, Dept Comp Sci & Technol, Suzhou, Peoples R China
来源
INTELLIGENT AUTOMATION AND SOFT COMPUTING | 2013年 / 19卷 / 04期
关键词
reinforcement learning; Sarsa (lambda); gradient descent; reward-shaping; adaptive;
D O I
10.1080/10798587.2013.869119
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Based on the adaptive reward-shaping mechanism, we propose a novel gradient descent (GD) Sarsa(lambda) algorithm to solve the problems of ill initial performance and low convergence speed in the reinforcement learning tasks with continuous state space. Adaptive normalized radial basis function (ANRBF) network is used to shape reward. The reward-shaping mechanism propagates model knowledge to the learner in the form of the additional reward signal so that the initial performance and convergence speed can be improved effectively. A function approximation algorithm named ANRBF-GD-Sarsa(lambda) is proposed based on the ANRBF network. The convergence of ANRBF-GD-Sarsa(lambda) is analyzed theoretically. Experiments are conducted to show the good initial performance and high convergence speed of the proposed algorithm.
引用
收藏
页码:599 / 612
页数:14
相关论文
共 50 条
  • [1] Laser beam shaping based on wavefront sensorless adaptive optics with stochastic parallel gradient descent algorithm
    Li, Yan
    Peng, Tairan
    Li, Wenlai
    Han, Hongming
    Ma, Jianqiang
    14TH NATIONAL CONFERENCE ON LASER TECHNOLOGY AND OPTOELECTRONICS (LTO 2019), 2019, 11170
  • [2] A Stochastic Gradient Descent Algorithm Based on Adaptive Differential Privacy
    Deng, Yupeng
    Li, Xiong
    He, Jiabei
    Liu, Yuzhen
    Liang, Wei
    COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2022, PT II, 2022, 461 : 133 - 152
  • [3] An improved adaptive momentum gradient descent algorithm
    Jiang Z.
    Song J.
    Liu Y.
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2023, 51 (05): : 137 - 143
  • [4] A Fast and Adaptive Search Algorithm Based on Rood Pattern and Gradient Descent
    Lin, Mu-Long
    Yi, Qing-Ming
    Shi, Min
    2012 INTERNATIONAL CONFERENCE ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING (ICMPBE2012), 2012, 33 : 1526 - 1532
  • [5] Reference-shaping adaptive control by using gradient descent optimizers
    Alagoz, Baris Baykant
    Kavuran, Gurkan
    Ates, Abdullah
    Yeroglu, Celaleddin
    PLOS ONE, 2017, 12 (11):
  • [6] An adaptive gradient descent attitude estimation algorithm based on a fuzzy system for UUVs
    Lyu, Feng
    Xu, Xin
    Zha, Xin
    OCEAN ENGINEERING, 2022, 266
  • [7] Wavefront sensorless adaptive optics based on the gradient descent algorithm with Hadamard model
    Chen, B. (chenbo182001@163.com), 2013, Science Press (40):
  • [8] Study on adaptive fuzzy control system based on gradient descent learning algorithm
    College of Electronic Information Eng., Chongqing University of Science and Technology, Chongqing 400050, China
    不详
    Xitong Fangzhen Xuebao, 2007, 6 (1265-1268+1273):
  • [9] Facility Layout in Horizontal Cylindrical Space Based on Adaptive Gradient Descent Algorithm
    Yuan, Peng
    Miao, Yiran
    Huang, Deqing
    Hao, Junkai
    Qin, Na
    2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 1931 - 1936
  • [10] Study on adaptive fuzzy control system based on gradient descent learning algorithm
    Xiong, Jundi
    Li, Taifu
    Xiao, Huihui
    Deng, Renming
    FUZZY INFORMATION AND ENGINEERING, PROCEEDINGS, 2007, 40 : 1009 - +