Provably Efficient Adversarial Imitation Learning with Unknown Transitions

被引:0
|
作者
Xu, Tian [1 ,4 ]
Li, Ziniu [2 ,3 ]
Yu, Yang [1 ,4 ]
Luo, Zhi-Quan [2 ,3 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Chinese Univ Hong Kong, Shenzhen, Peoples R China
[3] Shenzhen Res Inst Big Data, Shenzhen, Peoples R China
[4] Polixir Ai, Nanjing, Peoples R China
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imitation learning (IL) has proven to be an effective method for learning good policies from expert demonstrations. Adversarial imitation learning (AIL), a subset of IL methods, is particularly promising, but its theoretical foundation in the presence of unknown transitions has yet to be fully developed. This paper explores the theoretical underpinnings of AIL in this context, where the stochastic and uncertain nature of environment transitions presents a challenge. We examine the expert sample complexity and interaction complexity required to recover good policies. To this end, we establish a framework connecting reward-free exploration and AIL, and propose an algorithm, MB-TAIL, that achieves the minimax optimal expert sample complexity of (O) over tilde (H-3/2|S|/epsilon) and interaction complexity of (O) over tilde (H-3 |S|(2) |A|/epsilon(2)). Here, H represents the planning horizon, jSj is the state space size, |A| is the action space size, and epsilon is the desired imitation gap. MB-TAIL is the first algorithm to achieve this level of expert sample complexity in the unknown transition setting and improves upon the interaction complexity of the best-known algorithm, OAL, by O (H). Additionally, we demonstrate the generalization ability of MB-TAIL by extending it to the function approximation setting and proving that it can achieve expert sample and interaction complexity independent of |S|.
引用
收藏
页码:2367 / 2378
页数:12
相关论文
共 50 条
  • [1] Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation
    Liu, Zhihan
    Zhang, Yufeng
    Fu, Zuyue
    Yang, Zhuoran
    Wang, Zhaoran
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [2] Provably Efficient Imitation Learning from Observation Alone
    Sun, Wen
    Vemula, Anirudh
    Boots, Byron
    Bagnell, J. Andrew
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [3] Sample-efficient Adversarial Imitation Learning
    Jung, Dahuin
    Lee, Hyungyu
    Yoon, Sungroh
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [4] Sample-efficient Adversarial Imitation Learning
    Jung, Dahuin
    Lee, Hyungyu
    Yoon, Sungroh
    Journal of Machine Learning Research, 2024, 25 : 1 - 32
  • [5] Sample-efficient Adversarial Imitation Learning
    Jung, Dahuin
    Lee, Hyungyu
    Yoon, Sungroh
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 32
  • [6] Generative Adversarial Imitation Learning
    Ho, Jonathan
    Ermon, Stefano
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [7] Sample-Efficient Imitation Learning via Generative Adversarial Nets
    Blonde, Lionel
    Kalousis, Alexandros
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [8] Efficient Off-policy Adversarial Imitation Learning with Imperfect Demonstrations
    Li, Jiangeng
    Zhao, Qishen
    Huang, Shuai
    Zuo, Guoyu
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 1692 - 1697
  • [9] DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation
    Torabi, Faraz
    Warnell, Garrett
    Stone, Peter
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 2391 - 2397
  • [10] HumanMimic: Learning Natural Locomotion and Transitions for Humanoid Robot via Wasserstein Adversarial Imitation
    The University of Tokyo, Jsk Lab, Graduate School of Information Science and Technology, 7-3-1 Hongo, Bunkyo-ku, Tokyo
    113-8656, Japan
    不详
    8092, Switzerland
    Proc IEEE Int Conf Rob Autom, 2024, (13107-13114):