Gait Learning of Quadruped Robot Based on Deep Arbitration Strategy

被引:0
|
作者
Zhu X. [1 ,2 ]
Chen J. [1 ,2 ]
Zhang S. [1 ,2 ]
Liu X. [1 ,2 ]
Ruan X. [1 ,2 ]
机构
[1] Department of Information, Beijing University of Technology, Beijing
[2] Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing
关键词
arbitration mechanism; gait learning; quadruped robot; reinforcement learning;
D O I
10.15918/j.tbit1001-0645.2022.213
中图分类号
学科分类号
摘要
Reproducing the learning process of higher organisms is an important research direction in robot research. Some commonly used reinforcement learning algorithms had been explored based on actor critic (AC) networks to accomplish this task. Due to some shortcomings still existed in the reinforcement learning algorithms, some improvements were also took place. For the deep deterministic policy gradient (DDPG), an overestimated problem to Q value led to deterioration of the learning effect. Inspired by the arbitration mechanism in the prefrontal cortex of the brain, a deep arbitration actor critic (DAAC) algorithm was proposed, including two sets of evaluation networks. Through the arbitration mechanism, an optimal evaluation network was selected to update the policy parameters, solving the overestimated problem to Q value effectively. This algorithm enables the quadruped robot reproduce the bionic gait learning process. In simulation experiments, the DAAC algorithm was compared with three algorithms, DDPG, soft actor critic (SAC), and proximal policy optimization (PPO). The experiment results show that the gait of the quadruped robot trained by DAAC has better performance in three aspects, reward value, machine stability, and speed, verifying effectively the superiority of the algorithm. © 2023 Beijing Institute of Technology. All rights reserved.
引用
收藏
页码:1197 / 1204
页数:7
相关论文
共 23 条
  • [1] BISWAL P, MOHANTY P K., Development of quadruped walking robots: a review[J], Ain Shams Engineering Journal, 12, 2, pp. 2017-2031, (2021)
  • [2] ZHAO Jiangbo, GONG Sijin, MA Shicheng, Et al., Fractional-order virtual model control for single leg of hydraulic quadruped robot, Transactions of Beijing Institute of Technology, 42, 3, pp. 304-311, (2022)
  • [3] ASADI F, KHORRAM M, MOOSAVIAN S A A., CPG-based gait transition of a quadruped robot[C], Proceedings of 2015 3rd RSI International Conference on Robotics and Mechatronics (ICROM), pp. 210-215, (2015)
  • [4] DING Y, PANDALA A, LI C, Et al., Representation-free model predictive control for dynamic motions in quadrupeds[J], IEEE Transactions on Robotics, 37, 4, pp. 1154-1171, (2021)
  • [5] WANG Shoukun, LIU Dajiang, GUO Fei, Et al., Stewart type wheel foot control method based on dynamic model predictive control, Transactions of Beijing Institute of Technology, 41, 4, pp. 418-424, (2021)
  • [6] GANGAPURWALA S, GEISERT M, ORSOLINO R, Et al., Rloc: Terrain-aware legged locomotion using reinforcement learning and optimal control[J/OL]
  • [7] DONG Hao, YANG Jing, LI Shaobo, Et al., Research progress in robot motion control based on deep reinforcement learning, Control and Decision, 37, 2, pp. 278-292, (2022)
  • [8] JONES W, BLUM T, YOSHIDA K., Adaptive slope locomotion with deep reinforcement learning, Proceedings of 2020 IEEE/SICE International Symposium on System Integration (SII), pp. 546-550, (2020)
  • [9] CHAI J, OWAKI D, HAYASHIBE M., Deep reinforcement learning with gait mode specification for quadrupedal trot-gallop energetic analysis, Proceedings of 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp. 4583-4587, (2021)
  • [10] SHI Shengmiao, LIU Quan, Deep deterministic policy gradient with classified experience replay, Acta Automatica Sinica, 48, 7, pp. 1816-1823, (2022)