Reward Shaping to Learn Natural Object Manipulation With an Anthropomorphic Robotic Hand and Hand Pose Priors via On-Policy Reinforcement Learning

被引：1

作者：

Rivera, Patricio ^{[1
]}

Oh, Jiheon ^{[1
]}

Valarezo, Edwin ^{[1
]}

Ryu, Gahyeon ^{[1
]}

Jung, Hwanseok ^{[1
]}

Lee, Jin Hyunk ^{[1
]}

Jeong, Jin Gyun ^{[1
]}

Kim, Tae-Seong ^{[1
]}

机构：

[1] Kyung Hee Univ, Dept Elect & Informat Convergence Engn, Seoul, South Korea

来源：

12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION | 2021年

基金：

新加坡国家研究基金会;

关键词：

Anthropomorphic Robotic Hand; Deep Reinforcement Learning; Hand Poses Priors; Object Manipulation;

D O I：

10.1109/ICTC52510.2021.9620901

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

A key challenge in reinforcement learning (RL) for robot manipulation is to provide a reward function that allows reliable and stable learning to achieve their goals while interacting with the environment Unfortunately, rewards are usually task-specific, and their engineering is challenging and laborious especially for an anthropomorphic robotic hand with high degrees of freedom. In this work, we consider a reward function for learning a policy under the constrain of minimizing the robot hand pose to demonstration priors. We propose a shaped reward for obtaining efficient manipulation policies after incorporating five-fingered hand poses of grasping demonstrations for various objects into the early timesteps of the training episodes. The trained policy NPG+SR with our proposed reward improves the average success rate over 95% for grasping and relocating all objects compared to 68% obtained with the baseline NPG-B. We noticed that our method not only performs better but the qualitative results indicate that for the objects such as an apple, water bottle, and lightbulb incorporating hand pose priors for learning allows a more natural hand grasping.

引用

页码：167 / 171

页数：5