Model-based contextual policy search for data-efficient generalization of robot skills

被引:38
|
作者
Kupcsik, Andras [1 ,2 ]
Deisenroth, Marc Peter [5 ]
Peters, Jan [3 ,4 ]
Poh, Loh Ai [1 ]
Vadakkepat, Prahlad [1 ]
Neumann, Gerhard [3 ]
机构
[1] Natl Univ Singapore, Dept Elect & Comp Engn, 4 Engn Dr 3, Singapore 118571, Singapore
[2] Natl Univ Singapore, Sch Comp, 13 Comp Dr, Singapore 117417, Singapore
[3] Tech Univ Darmstadt, Fachbereich Informat, Fachgebiet Intelligente Autonome Syst, Hsch Str 10, D-64289 Darmstadt, Germany
[4] Max Planck Inst Intelligent Syst, Spemannstr 38, D-72076 Tubingen, Germany
[5] Imperial Coll London, Dept Comp, 180 Queens Gate, London SW7 2AZ, England
关键词
Robotics; Reinforcement learning; Contextual policy search; Model-based policy search; Robot skill generalization; Gaussian processes; Movement primitives; Robot table tennis; Robot hockey;
D O I
10.1016/j.artint.2014.11.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In robotics, lower-level controllers are typically used to make the robot solve a specific task in a fixed context. For example, the lower-level controller can encode a hitting movement while the context defines the target coordinates to hit. However, in many learning problems the context may change between task executions. To adapt the policy to a new context, we utilize a hierarchical approach by learning an upper-level policy that generalizes the lower-level controllers to new contexts. A common approach to learn such upper-level policies is to use policy search. However, the majority of current contextual policy search approaches are model-free and require a high number of interactions with the robot and its environment. Model-based approaches are known to significantly reduce the amount of robot experiments, however, current model-based techniques cannot be applied straightforwardly to the problem of learning contextual upper-level policies. They rely on specific parametrizations of the policy and the reward function, which are often unrealistic in the contextual policy search formulation. In this paper, we propose a novel model-based contextual policy search algorithm that is able to generalize lower-level controllers, and is data-efficient. Our approach is based on learned probabilistic forward models and information theoretic policy search. Unlike current algorithms, our method does not require any assumption on the parametrization of the policy or the reward function. We show on complex simulated robotic tasks and in a real robot experiment that the proposed learning framework speeds up the learning process by up to two orders of magnitude in comparison to existing methods, while learning high quality policies. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:415 / 439
页数:25
相关论文
共 50 条
  • [1] DATA-EFFICIENT MODEL-BASED REINFORCEMENT LEARNING FOR ROBOT CONTROL
    Sun, Ming
    Gao, Yue
    Liu, Wei
    Li, Shaoyuan
    INTERNATIONAL JOURNAL OF ROBOTICS & AUTOMATION, 2021, 36 (04): : 211 - 218
  • [2] Data-Efficient Task Generalization via Probabilistic Model-Based Meta Reinforcement Learning
    Bhardwaj, Arjun
    Rothfuss, Jonas
    Sukhija, Bhavya
    As, Yarden
    Hutter, Marco
    Coros, Stelian
    Krause, Andreas
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (04) : 3918 - 3925
  • [3] Data-efficient model-based reinforcement learning with trajectory discrimination
    Tuo Qu
    Fuqing Duan
    Junge Zhang
    Bo Zhao
    Wenzhen Huang
    Complex & Intelligent Systems, 2024, 10 : 1927 - 1936
  • [4] Data-efficient model-based reinforcement learning with trajectory discrimination
    Qu, Tuo
    Duan, Fuqing
    Zhang, Junge
    Zhao, Bo
    Huang, Wenzhen
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (02) : 1927 - 1936
  • [5] Data-Efficient Policy Evaluation Through Behavior Policy Search
    Hanna, Josiah P.
    Chandak, Yash
    Thomas, Philip S.
    White, Martha
    Stone, Peter
    Niekum, Scott
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 58
  • [6] Optimizing Traffic Control with Model-Based Learning: A Pessimistic Approach to Data-Efficient Policy Inference
    Kunjir, Mayuresh
    Chawla, Sanjay
    Chandrasekar, Siddarth
    Jay, Devika
    Ravindran, Balaraman
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 1176 - 1187
  • [7] Data-Efficient Policy Evaluation Through Behavior Policy Search
    Hanna, Josiah P.
    Thomas, Philip S.
    Stone, Peter
    Niekum, Scott
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [8] Fast Model Identification via Physics Engines for Data-Efficient Policy Search
    Zhu, Shaojun
    Kimmel, Andrew
    Bekris, Kostas E.
    Boularias, Abdeslam
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3249 - 3256
  • [9] Black-Box Data-efficient Policy Search for Robotics
    Chatzilygeroudis, Konstantinos
    Rama, Roberto
    Kaushik, Rituraj
    Goepp, Dorian
    Vassiliades, Vassilis
    Mouret, Jean-Baptiste
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 51 - 58
  • [10] Identifying Ordinary Differential Equations for Data-efficient Model-based Reinforcement Learning
    Nagel, Tobias
    Huber, Marco F.
    2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,