Model-based contextual policy search for data-efficient generalization of robot skills

被引:38
|
作者
Kupcsik, Andras [1 ,2 ]
Deisenroth, Marc Peter [5 ]
Peters, Jan [3 ,4 ]
Poh, Loh Ai [1 ]
Vadakkepat, Prahlad [1 ]
Neumann, Gerhard [3 ]
机构
[1] Natl Univ Singapore, Dept Elect & Comp Engn, 4 Engn Dr 3, Singapore 118571, Singapore
[2] Natl Univ Singapore, Sch Comp, 13 Comp Dr, Singapore 117417, Singapore
[3] Tech Univ Darmstadt, Fachbereich Informat, Fachgebiet Intelligente Autonome Syst, Hsch Str 10, D-64289 Darmstadt, Germany
[4] Max Planck Inst Intelligent Syst, Spemannstr 38, D-72076 Tubingen, Germany
[5] Imperial Coll London, Dept Comp, 180 Queens Gate, London SW7 2AZ, England
关键词
Robotics; Reinforcement learning; Contextual policy search; Model-based policy search; Robot skill generalization; Gaussian processes; Movement primitives; Robot table tennis; Robot hockey;
D O I
10.1016/j.artint.2014.11.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In robotics, lower-level controllers are typically used to make the robot solve a specific task in a fixed context. For example, the lower-level controller can encode a hitting movement while the context defines the target coordinates to hit. However, in many learning problems the context may change between task executions. To adapt the policy to a new context, we utilize a hierarchical approach by learning an upper-level policy that generalizes the lower-level controllers to new contexts. A common approach to learn such upper-level policies is to use policy search. However, the majority of current contextual policy search approaches are model-free and require a high number of interactions with the robot and its environment. Model-based approaches are known to significantly reduce the amount of robot experiments, however, current model-based techniques cannot be applied straightforwardly to the problem of learning contextual upper-level policies. They rely on specific parametrizations of the policy and the reward function, which are often unrealistic in the contextual policy search formulation. In this paper, we propose a novel model-based contextual policy search algorithm that is able to generalize lower-level controllers, and is data-efficient. Our approach is based on learned probabilistic forward models and information theoretic policy search. Unlike current algorithms, our method does not require any assumption on the parametrization of the policy or the reward function. We show on complex simulated robotic tasks and in a real robot experiment that the proposed learning framework speeds up the learning process by up to two orders of magnitude in comparison to existing methods, while learning high quality policies. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:415 / 439
页数:25
相关论文
共 50 条
  • [21] MoVie: Visual Model-Based Policy Adaptation for View Generalization
    Yang, Sizhe
    Ze, Yanjie
    Xu, Huazhe
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [22] Gradient-Aware Model-Based Policy Search
    D'Oro, Pierluca
    Metelli, Alberto Maria
    Tirinzoni, Andrea
    Papini, Matteo
    Restelli, Marcello
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 3801 - 3808
  • [23] On the Influence of Time-Correlation in Initial Training Data for Model-Based Policy Search
    Hanna, Elias
    Doncieux, Stephane
    2023 21ST INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS, ICAR, 2023, : 361 - 366
  • [24] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
    Thomas, Philip S.
    Brunskill, Emma
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [25] Cascaded Gaussian Processes for Data-efficient Robot Dynamics Learning
    Rezaei-Shoshtari, Sahand
    Meger, David
    Sharf, Inna
    2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 6871 - 6877
  • [26] Contextual Policy Search for Linear and Nonlinear Generalization of a Humanoid Walking Controller
    Abbas Abdolmaleki
    Nuno Lau
    Luis Paulo Reis
    Jan Peters
    Gerhard Neumann
    Journal of Intelligent & Robotic Systems, 2016, 83 : 393 - 408
  • [27] Contextual Policy Search for Linear and Nonlinear Generalization of a Humanoid Walking Controller
    Abdolmaleki, Abbas
    Lau, Nuno
    Reis, Luis Paulo
    Peters, Jan
    Neumann, Gerhard
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2016, 83 (3-4) : 393 - 408
  • [28] DEGREE: A Data-Efficient Generation-Based Event Extraction Model
    Hsu, I-Hung
    Huang, Kuan-Hao
    Boschee, Elizabeth
    Miller, Scott
    Natarajan, Premkumar
    Chang, Kai-Wei
    Peng, Nanyun
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1890 - 1908
  • [29] Model-Based Domain Generalization
    Robey, Alexander
    Pappas, George J.
    Hassani, Flamed
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [30] Provably Efficient Causal Model-Based Reinforcement Learning for Systematic Generalization
    Mutti, Mirco
    De Santi, Riccardo
    Rossi, Emanuele
    Calderon, Juan Felipe
    Bronstein, Michael
    Restelli, Marcello
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9251 - 9259