Model-Free Imitation Learning with Policy Optimization

被引:0
|
作者
Ho, Jonathan [1 ]
Gupta, Jayesh K. [1 ]
Ermon, Stefano [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In imitation learning, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing imitation learning algorithms typically involve solving a sequence of planning or reinforcement learning problems. Such algorithms are therefore not directly applicable to large, high-dimensional environments, and their performance can significantly degrade if the planning problems are not solved to optimality. Under the apprenticeship learning formalism, we develop alternative model-free algorithms for finding a parameterized stochastic policy that performs at least as well as an expert policy on an unknown cost function, based on sample trajectories from the expert. Our approach, based on policy gradients, scales to large continuous environments with guaranteed convergence to local minima.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Model-free safe policy learning via hard action barrier functions
    Castellano, Agustin
    Bazerque, Juan
    Mallada, Enrique
    2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
  • [22] Policy Prediction Network: Model-Free Behavior Policy with Model-Based Learning in Continuous Action Space
    Wellmer, Zac
    Kwok, James T.
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT III, 2020, 11908 : 118 - 133
  • [23] Unsupervised Model-Free Representation Learning
    Ryabko, Daniil
    ALGORITHMIC LEARNING THEORY (ALT 2013), 2013, 8139 : 354 - 366
  • [24] Learning model-free motor control
    Agostini, A
    Celaya, E
    ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 947 - 948
  • [25] Policy Gradient Bayesian Robust Optimization for Imitation Learning
    Javed, Zaynah
    Brown, Daniel S.
    Sharma, Satvik
    Zhu, Jerry
    Balakrishna, Ashwin
    Petrik, Marek
    Dragan, Anca D.
    Goldberg, Ken
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [26] MODEL-FREE LEARNING FROM DEMONSTRATION
    Billing, Erik A.
    Hellstrom, Thomas
    Janlert, Lars-Erik
    ICAART 2010: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2: AGENTS, 2010, : 62 - 71
  • [27] Policy Improvement by a Model-Free Dyna Architecture
    Hwang, Kao-Shing
    Lo, Chia-Yue
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (05) : 776 - 788
  • [28] Model-free least squares policy iteration
    Lagoudakis, MG
    Parr, R
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 1547 - 1554
  • [29] Combining Model-Based Design and Model-Free Policy Optimization to Learn Safe, Stabilizing Controllers
    Westenbroek, Tyler
    Agrawal, Ayush
    Castaneda, Fernando
    Sastry, S. Shankar
    Sreenath, Koushil
    IFAC PAPERSONLINE, 2021, 54 (05): : 19 - 24
  • [30] Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics
    Steckelmacher, Denis
    Plisnier, Helene
    Roijers, Diederik M.
    Nowe, Ann
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT III, 2020, 11908 : 19 - 34