Hierarchical Imitation Learning for Stochastic Environments

被引:0
|
作者
Igl, Maximilian [1 ]
Shah, Punit [1 ]
Mougin, Paul [1 ]
Srinivasan, Sirish [1 ]
Gupta, Tarun [1 ]
White, Brandyn [1 ]
Shiarlis, Kyriacos [1 ]
Whiteson, Shimon [1 ]
机构
[1] Waymo Res, Mountain View, CA 94043 USA
关键词
D O I
10.1109/IROS55552.2023.10341451
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many applications of imitation learning require the agent to generate the full distribution of behaviour observed in the training data. For example, to evaluate the safety of autonomous vehicles in simulation, accurate and diverse behaviour models of other road users are paramount. Existing methods that improve this distributional realism typically rely on hierarchical policies. These condition the policy on types such as goals or personas that give rise to multi-modal behaviour. However, such methods are often inappropriate for stochastic environments where the agent must also react to external factors: because agent types are inferred from the observed future trajectory during training, these environments require that the contributions of internal and external factors to the agent behaviour are disentangled and only internal factors, i.e., those under the agent's control, are encoded in the type. Encoding future information about external factors leads to inappropriate agent reactions during testing, when the future is unknown and types must be drawn independently from the actual future. We formalize this challenge as distribution shift in the conditional distribution of agent types under environmental stochasticity. We propose Robust Type Conditioning (RTC), which eliminates this shift with adversarial training under randomly sampled types. Experiments on two domains, including the large-scale Waymo Open Motion Dataset, show improved distributional realism while maintaining or improving task performance compared to state-of-the-art baselines.
引用
收藏
页码:1697 / 1704
页数:8
相关论文
共 50 条
  • [41] Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller
    Sharma, Pratyusha
    Pathak, Deepak
    Gupta, Abhinav
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [42] Future Prediction with Hierarchical Episodic Memories under Deterministic and Stochastic Environments
    Aota, Yoshito
    Miyake, Yoshihiro
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 251 - 259
  • [43] Dynamic Abstraction for Hierarchical Problem Solving and Execution in Stochastic Dynamic Environments
    Nyblom, Per
    STAIRS 2006, 2006, 142 : 263 - 264
  • [44] Intrinsically Motivated Hierarchical Skill Learning in Structured Environments
    Vigorito, Christopher M.
    Barto, Andrew G.
    IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, 2010, 2 (02) : 132 - 143
  • [45] LEARNING OF IMITATION AND LEARNING THROUGH IMITATION IN WHITE RAT
    HARUKI, Y
    TSUZUKI, T
    ANNUAL OF ANIMAL PSYCHOLOGY, 1967, 17 (02): : 57 - &
  • [46] Hierarchical control for stochastic network traffic with reinforcement learning
    Su, Z. C.
    Chow, Andy H. F.
    Fang, C. L.
    Liang, E. M.
    Zhong, R. X.
    TRANSPORTATION RESEARCH PART B-METHODOLOGICAL, 2023, 167 : 196 - 216
  • [47] Recognition of human activity through hierarchical stochastic learning
    Lühr, S
    Bui, HH
    Venkatesh, S
    West, GAW
    PROCEEDINGS OF THE FIRST IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS (PERCOM 2003), 2003, : 416 - 422
  • [48] Intervention Force-based Imitation Learning for Autonomous Navigation in Dynamic Environments
    Yokoyama, Tomoya
    Seiya, Shunya
    Takeuchi, Eijiro
    Takeda, Kazuya
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 1679 - 1688
  • [49] Experimentation, imitation, and stochastic stability
    Gale, D
    Rosenthal, RW
    JOURNAL OF ECONOMIC THEORY, 1999, 84 (01) : 1 - 40
  • [50] Imitation Flow: Learning Deep Stable Stochastic Dynamic Systems by Normalizing Flows
    Urain, Julen
    Ginesi, Michele
    Tateo, Davide
    Peters, Jan
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5231 - 5237