Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards

被引:0
|
作者
Metcalf, Katherine [1 ]
Sarabia, Miguel [1 ]
Mackraz, Natalie [1 ]
Theobald, Barry-John [1 ]
机构
[1] Apple, Cupertino, CA 95014 USA
来源
关键词
human-in-the-loop learning; preference-based RL; RLHF;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Preference-based reinforcement learning (PbRL) aligns a robot behavior with human preferences via a reward function learned from binary feedback over agent behaviors. We show that dynamics-aware reward functions improve the sample efficiency of PbRL by an order of magnitude. In our experiments we iterate between: (1) learning a dynamics-aware state-action representation z(sa) via a self-supervised temporal consistency task, and (2) bootstrapping the preference-based reward function from z(sa), which results in faster policy learning and better final policy performance. For example, on quadruped-walk, walker-walk, and cheetah-run, with 50 preference labels we achieve the same performance as existing approaches with 500 preference labels, and we recover 83% and 66% of ground truth reward policy performance versus only 38% and 21%. The performance gains demonstrate the benefits of explicitly learning a dynamics-aware reward model. Repo: https://github.com/apple/ml-reed.
引用
收藏
页数:49
相关论文
共 50 条
  • [1] Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models
    Liu, Yi
    Datta, Gaurav
    Novoseller, Ellen
    Brown, Daniel S.
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 2921 - 2928
  • [2] Sample-Efficient Reinforcement Learning of Undercomplete POMDPs
    Jin, Chi
    Kakade, Sham M.
    Krishnamurthy, Akshay
    Liu, Qinghua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [3] Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation
    Ren, Zhizhou
    Liu, Anji
    Liang, Yitao
    Peng, Jian
    Ma, Jianzhu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [4] Sample-efficient Reinforcement Learning Representation Learning with Curiosity Contrastive Forward Dynamics Model
    Nguyen, Thanh
    Luu, Tung M.
    Vu, Thang
    Yoo, Chang D.
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 3471 - 3477
  • [5] Sample-Efficient Multimodal Dynamics Modeling for Risk-Sensitive Reinforcement Learning
    Yashima, Ryota
    Yamaguchi, Akihiko
    Hashimoto, Koichi
    2022 8TH INTERNATIONAL CONFERENCE ON MECHATRONICS AND ROBOTICS ENGINEERING (ICMRE 2022), 2022, : 21 - 27
  • [6] Sample-Efficient Multimodal Dynamics Modeling for Risk-Sensitive Reinforcement Learning
    Yashima, Ryota
    Yamaguchi, Akihiko
    Hashimoto, Koichi
    2022 8th International Conference on Mechatronics and Robotics Engineering, ICMRE 2022, 2022, : 21 - 27
  • [7] Sample-efficient model-based reinforcement learning for quantum control
    Khalid, Irtaza
    Weidner, Carrie A.
    Jonckheere, Edmond A.
    Schirmer, Sophie G.
    Langbein, Frank C.
    PHYSICAL REVIEW RESEARCH, 2023, 5 (04):
  • [8] Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information
    Efroni, Yonathan
    Foster, Dylan J.
    Misra, Dipendra
    Krishnamurthy, Akshay
    Langford, John
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [9] Sample-efficient Reinforcement Learning in Robotic Table Tennis
    Tebbe, Jonas
    Krauch, Lukas
    Gao, Yapeng
    Zell, Andreas
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 4171 - 4178
  • [10] Sample-efficient reinforcement learning for CERN accelerator control
    Kain, Verena
    Hirlander, Simon
    Goddard, Brennan
    Velotti, Francesco Maria
    Porta, Giovanni Zevi Della
    Bruchon, Niky
    Valentino, Gianluca
    PHYSICAL REVIEW ACCELERATORS AND BEAMS, 2020, 23 (12)