Model-Free Imitation Learning with Policy Optimization

被引:0
|
作者
Ho, Jonathan [1 ]
Gupta, Jayesh K. [1 ]
Ermon, Stefano [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In imitation learning, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing imitation learning algorithms typically involve solving a sequence of planning or reinforcement learning problems. Such algorithms are therefore not directly applicable to large, high-dimensional environments, and their performance can significantly degrade if the planning problems are not solved to optimality. Under the apprenticeship learning formalism, we develop alternative model-free algorithms for finding a parameterized stochastic policy that performs at least as well as an expert policy on an unknown cost function, based on sample trajectories from the expert. Our approach, based on policy gradients, scales to large continuous environments with guaranteed convergence to local minima.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Model-free Policy Learning with Reward Gradients
    Lan, Qingfong
    Tosatto, Samuele
    Farrahi, Homayoon
    Mahmood, A. Rupam
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [2] Model-Free Trajectory Optimization for Reinforcement Learning
    Akrour, Riad
    Abdolmaleki, Abbas
    Abdulsamad, Hany
    Neumann, Gerhard
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [3] Model-Free Inverse H-Infinity Control for Imitation Learning
    Xue, Wenqian
    Lian, Bosen
    Kartal, Yusuf
    Fan, Jialu
    Chai, Tianyou
    Lewis, Frank L.
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 5661 - 5672
  • [4] Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization
    Dong, Kun
    Luo, Yongle
    Wang, Yuxin
    Liu, Yu
    Qu, Chengeng
    Zhang, Qiang
    Cheng, Erkang
    Sun, Zhiyong
    Song, Bo
    KNOWLEDGE-BASED SYSTEMS, 2024, 287
  • [5] Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey
    Liu, Yongshuai
    Halev, Avishai
    Liu, Xin
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4508 - 4515
  • [6] Constrained model-free reinforcement learning for process optimization
    Pan, Elton
    Petsagkourakis, Panagiotis
    Mowbray, Max
    Zhang, Dongda
    del Rio-Chanona, Ehecatl Antonio
    COMPUTERS & CHEMICAL ENGINEERING, 2021, 154
  • [7] Model-Free Unsupervised Learning for Optimization Problems with Constraints
    Sun, Chengjian
    Liu, Dong
    Yang, Chenyang
    PROCEEDINGS OF 2019 25TH ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS (APCC), 2019, : 392 - 397
  • [8] Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis
    Lin, Mingduo
    Zhao, Bo
    Liu, Derong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 12
  • [9] Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis
    Lin, Mingduo
    Zhao, Bo
    Liu, Derong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (03) : 5574 - 5585
  • [10] Optimal Online Learning Procedures for Model-Free Policy Evaluation
    Ueno, Tsuyoshi
    Maeda, Shin-ichi
    Kawanabe, Motoaki
    Ishii, Shin
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 473 - +