Model-Free Imitation Learning with Policy Optimization

被引：0

作者：

Ho, Jonathan ^{[1
]}

Gupta, Jayesh K. ^{[1
]}

Ermon, Stefano ^{[1
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48 | 2016年 / 48卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In imitation learning, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing imitation learning algorithms typically involve solving a sequence of planning or reinforcement learning problems. Such algorithms are therefore not directly applicable to large, high-dimensional environments, and their performance can significantly degrade if the planning problems are not solved to optimality. Under the apprenticeship learning formalism, we develop alternative model-free algorithms for finding a parameterized stochastic policy that performs at least as well as an expert policy on an unknown cost function, based on sample trajectories from the expert. Our approach, based on policy gradients, scales to large continuous environments with guaranteed convergence to local minima.

引用

页数：10

共 50 条

[1] Model-free Policy Learning with Reward Gradients
Lan, Qingfong
Tosatto, Samuele
Farrahi, Homayoon
Mahmood, A. Rupam
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[2] Model-Free Trajectory Optimization for Reinforcement Learning
Akrour, Riad
Abdolmaleki, Abbas
Abdulsamad, Hany
Neumann, Gerhard
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[3] Model-Free Inverse H-Infinity Control for Imitation Learning
Xue, Wenqian
Lian, Bosen
Kartal, Yusuf
Fan, Jialu
Chai, Tianyou
Lewis, Frank L.
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 5661 - 5672
[4] Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization
Dong, Kun
Luo, Yongle
Wang, Yuxin
Liu, Yu
Qu, Chengeng
Zhang, Qiang
Cheng, Erkang
Sun, Zhiyong
Song, Bo
KNOWLEDGE-BASED SYSTEMS, 2024, 287
[5] Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey
Liu, Yongshuai
Halev, Avishai
Liu, Xin
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4508 - 4515
[6] Constrained model-free reinforcement learning for process optimization
Pan, Elton
Petsagkourakis, Panagiotis
Mowbray, Max
Zhang, Dongda
del Rio-Chanona, Ehecatl Antonio
COMPUTERS & CHEMICAL ENGINEERING, 2021, 154
[7] Model-Free Unsupervised Learning for Optimization Problems with Constraints
Sun, Chengjian
Liu, Dong
Yang, Chenyang
PROCEEDINGS OF 2019 25TH ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS (APCC), 2019, : 392 - 397
[8] Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis
Lin, Mingduo
Zhao, Bo
Liu, Derong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 12
[9] Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis
Lin, Mingduo
Zhao, Bo
Liu, Derong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (03) : 5574 - 5585
[10] Optimal Online Learning Procedures for Model-Free Policy Evaluation
Ueno, Tsuyoshi
Maeda, Shin-ichi
Kawanabe, Motoaki
Ishii, Shin
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 473 - +

← 1 2 3 4 5 →