Provably Efficient Imitation Learning from Observation Alone

被引:0
|
作者
Sun, Wen [1 ]
Vemula, Anirudh [1 ]
Boots, Byron [2 ]
Bagnell, J. Andrew [3 ]
机构
[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
[3] Aurora Innovat, Pittsburgh, PA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL is the first provably efficient algorithm in ILFO setting, which learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results, which typically only consider tabular reinforcement learning settings or settings that require access to a near-optimal reset distribution. We also demonstrate the efficacy of FAIL on multiple OpenAI Gym control tasks.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Provably Efficient Q-Learning with Low Switching Cost
    Bai, Yu
    Xie, Tengyang
    Jiang, Nan
    Wang, Yu-Xiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [42] Provably Efficient Causal Reinforcement Learning with Confounded Observational Data
    Wang, Lingxiao
    Yang, Zhuoran
    Wang, Zhaoran
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [43] Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems
    Uehara, Masatoshi
    Sekhari, Ayush
    Kallus, Nathan
    Lee, Jason D.
    Sun, Wen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [44] Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping
    Zhou, Dongruo
    He, Jiafan
    Gu, Quanquan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [45] Provably Efficient Offline Reinforcement Learning in Regular Decision Processes
    Cipollone, Roberto
    Jonsson, Anders
    Ronca, Alessandro
    Talebi, Mohammad Sadegh
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [46] Social Learning of a Spatial Task by Observation Alone
    Doublet, Thomas
    Nosrati, Mona
    Kentros, Clifford G.
    FRONTIERS IN BEHAVIORAL NEUROSCIENCE, 2022, 16
  • [47] Provably Efficient Offline Reinforcement Learning With Trajectory-Wise Reward
    Xu, Tengyu
    Wang, Yue
    Zou, Shaofeng
    Liang, Yingbin
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (09) : 6481 - 6518
  • [48] Provably efficient machine learning for quantum many-body problems
    Huang, Hsin-Yuan
    Kueng, Richard
    Torlai, Giacomo
    Albert, Victor V.
    Preskill, John
    SCIENCE, 2022, 377 (6613) : 1397 - +
  • [49] Provably Efficient Neural GTD Algorithm for Off-policy Learning
    Wai, Hoi-To
    Yang, Zhuoran
    Wang, Zhaoran
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [50] Provably Efficient Multi-Task Reinforcement Learning with Model Transfer
    Zhang, Chicheng
    Wang, Zhi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34