Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA

被引:36
|
作者
Da Silva, Lucileide M. D. [1 ]
Torquato, Matheus F. [2 ]
Fernandes, Marcelo A. C. [3 ]
机构
[1] Fed Inst Rio Grande do Norte, Dept Comp Sci & Technol, BR-59200000 Santa Cruz, Brazil
[2] Swansea Univ, Coll Engn, Swansea SA2 8PP, W Glam, Wales
[3] Univ Fed Rio Grande do Norte, Dept Comp Engn & Automat, BR-59078970 Natal, RN, Brazil
关键词
FPGA; Q-learning; reinforcement learning; reconfigurable computing; HARDWARE; ARCHITECTURE; NETWORK;
D O I
10.1109/ACCESS.2018.2885950
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Q-learning is an off-policy reinforcement learning technique, which has the main advantage of obtaining an optimal policy interacting with an unknown model environment. This paper proposes a parallel fixed-point Q-learning algorithm architecture implemented on field programmable gate arrays (FPGA) focusing on optimizing the system processing time. The convergence results are presented, and the processing time and occupied area were analyzed for different states and actions sizes scenarios and various fixed-point formats. The studies concerning the accuracy of the Q-learning technique response and resolution error associated with a decrease in the number of bits were also carried out for hardware implementation. The architecture implementation details were featured. The entire project was developed using the system generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA.
引用
收藏
页码:2782 / 2798
页数:17
相关论文
共 50 条
  • [1] An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
    Spano, Sergio
    Cardarilli, Gian Carlo
    Di Nunzio, Luca
    Fazzolari, Rocco
    Giardino, Daniele
    Matta, Marco
    Nannarelli, Alberto
    Re, Marco
    IEEE ACCESS, 2019, 7 : 186340 - 186351
  • [2] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [3] Fuzzy Q-Learning for generalization of reinforcement learning
    Berenji, HR
    FUZZ-IEEE '96 - PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, 1996, : 2208 - 2214
  • [4] Deep Reinforcement Learning with Double Q-Learning
    van Hasselt, Hado
    Guez, Arthur
    Silver, David
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2094 - 2100
  • [5] Reinforcement learning guidance law of Q-learning
    Zhang Q.
    Ao B.
    Zhang Q.
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2020, 42 (02): : 414 - 419
  • [6] FARANE-Q: Fast Parallel and Pipeline Q-Learning Accelerator for Configurable Reinforcement Learning SoC
    Sutisna, Nana
    Ilmy, Andi M. Riyadhus
    Syafalni, Infall
    Mulyawan, Rahmat
    Adiono, Trio
    IEEE ACCESS, 2023, 11 : 144 - 161
  • [7] Learning mixed behaviours with parallel Q-Learning
    Laurent, GJ
    Piat, E
    2002 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-3, PROCEEDINGS, 2002, : 1002 - 1007
  • [8] Feasible Q-Learning for Average Reward Reinforcement Learning
    Jin, Ying
    Blanchet, Jose
    Gummadi, Ramki
    Zhou, Zhengyuan
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [9] Mildly Conservative Q-Learning for Offline Reinforcement Learning
    Lyu, Jiafei
    Ma, Xiaoteng
    Li, Xiu
    Lu, Zongqing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [10] Adaptable Conservative Q-Learning for Offline Reinforcement Learning
    Qiu, Lyn
    Li, Xu
    Liang, Lenghan
    Sun, Mingming
    Yan, Junchi
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 200 - 212