Step into High-Dimensional and Continuous Action Space: A Survey on Applications of Deep Reinforcement Learning to Robotics

被引:0
|
作者
Duo N. [1 ]
Lü Q. [1 ]
Lin H. [1 ]
Wei H. [1 ]
机构
[1] Academy of Army Armored Force, Beijing
来源
Jiqiren/Robot | 2019年 / 41卷 / 02期
关键词
Deep learning; Reinforcement learning; Robotics;
D O I
10.13973/j.cnki.robot.180336
中图分类号
学科分类号
摘要
Firstly, the emergence and development of DRL (deep reinforcement learning) are reviewed. Secondly, DRL algorithms used in high-dimensional and continuous action space are classified into value function approximation based algorithms, policy approximation based algorithms and other structures based algorithms. Then, typical DRL algorithms and their characteristics are introduced, especially their ideas, advantages and disadvantages. Finally, the future trends of applying DRL to robotics are forecasted according to the development directions of DRL algorithms. © 2019, Science Press. All right reserved.
引用
收藏
页码:276 / 288
页数:12
相关论文
共 73 条
  • [1] Caloud P., Choi W., Latombe J.C., Et al., Indoor automation with many mobile robots, IEEE International Workshop on Intelligent Robots and Systems, pp. 67-72, (1990)
  • [2] Burgard W., Moors M., Stachniss C., Et al., Coordinated multirobot exploration, IEEE Transactions on Robotics, 21, 3, pp. 376-386, (2005)
  • [3] Qian S.H., Ge S.R., Wang Y.S., Et al., Research status of the disaster rescue robot and its applications to the mine rescue, Robot, 28, 3, pp. 350-354, (2006)
  • [4] Roberts R., Ta D.N., Straub J., Et al., Saliency detection and model-based tracking: A two part vision system for small robot navigation in forested environment, Proceedings of SPIE, 8387, (2012)
  • [5] Jiang X.Z., Servo control of joint driven by two pneumatic muscles in opposing pair configuration for rehabilitation robot, (2011)
  • [6] Tesauro G., TD-gammon, a self-teaching backgammon program, achieves master-level play, Neural Computation, 6, 2, pp. 215-219, (1994)
  • [7] Silver D., Schrittwieser J., Simonyan K., Et al., Mastering the game of Go without human knowledge, Nature, 550, 7676, pp. 354-359, (2017)
  • [8] Tian Y.D., Zhu Y., Better computer Go player with neural network and long-term prediction
  • [9] Kocsis L., Szepesvari C., Bandit based Monte-Carlo planning, Lecture Notes in Computer Science, 4212, pp. 282-293, (2006)
  • [10] Zhao T.T., Hachiya H., Niu G., Et al., Analysis and improvement of policy gradient estimation, Neural Networks, 26, 2, pp. 118-129, (2012)