Self-Supervised Exploration via Disagreement

被引:0
|
作者
Pathak, Deepak [1 ]
Gandhi, Dhiraj [2 ]
Gupta, Abhinav [2 ,3 ]
机构
[1] UC Berkelely, Berkeley, CA 94720 USA
[2] CMU, Pittsburgh, PA USA
[3] Facebook AI Res, Menlo Pk, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Efficient exploration is a long-standing problem in sensorimotor learning. Major advances have been demonstrated in noise-free, non-stochastic domains such as video games and simulation. However, most of these formulations either get stuck in environments with stochastic dynamics or are too inefficient to be scalable to real robotics setups. In this paper, we propose a formulation for exploration inspired by the work in active learning literature. Specifically, we train an ensemble of dynamics models and incentivize the agent to explore such that the disagreement of those ensembles is maximized This allows the agent to learn skills by exploring in a self-supervised manner without any external reward. Notably, we further leverage the disagreement objective to optimize the agent's policy in a differentiable manner, without using reinforcement learning, which results in a sample-efficient exploration. We demonstrate the efficacy of this formulation across a variety of benchmark environments including stochastic-Atari, Mujoco and Unity. Finally, we implement our differentiable exploration on a real robot which learns to interact with objects completely from scratch. Project videos and code are at https://pathak22.github. io/exploration-by-disagreement/.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Progressive Video Summarization via Multimodal Self-supervised Learning
    Li, Haopeng
    Ke, Qiuhong
    Gong, Mingming
    Drummond, Tom
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5573 - 5582
  • [42] Affinity Learning Via Self-Supervised Diffusion for Spectral Clustering
    Ye, Jianfeng
    Li, Qilin
    Yu, Jinlong
    Wang, Xincheng
    Wang, Huaming
    IEEE ACCESS, 2021, 9 : 7170 - 7182
  • [43] Repeatable adaptive keypoint detection via self-supervised learning
    Pei Yan
    Yihua Tan
    Yuan Tai
    Science China Information Sciences, 2022, 65
  • [44] Self-Supervised Learning of Point Clouds via Orientation Estimation
    Poursaeed, Omid
    Jiang, Tianxing
    Qiao, Han
    Xu, Nayun
    Kim, Vladimir G.
    2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 1018 - 1028
  • [45] Self-Supervised Representation Learning via Latent Graph Prediction
    Xie, Yaochen
    Xu, Zhao
    Ji, Shuiwang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [46] Self-Supervised Graph Representation Learning via Topology Transformations
    Gao, Xiang
    Hu, Wei
    Qi, Guo-Jun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 4202 - 4215
  • [47] Accelerating Self-Supervised Learning via Efficient Training Strategies
    Kocyigit, Mustafa Taha
    Hospedales, Timothy M.
    Bilen, Hakan
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5643 - 5653
  • [48] Efficient Medical Image Assessment via Self-supervised Learning
    Huang, Chun-Yin
    Lei, Qi
    Li, Xiaoxiao
    DATA AUGMENTATION, LABELLING, AND IMPERFECTIONS (DALI 2022), 2022, 13567 : 102 - 111
  • [49] Predicting Human Mobility via Self-Supervised Disentanglement Learning
    Gao, Qiang
    Hong, Jinyu
    Xu, Xovee
    Kuang, Ping
    Zhou, Fan
    Trajcevski, Goce
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (05) : 2126 - 2141
  • [50] A self-supervised entity alignment framework via attribute correction
    Zhang, Xin
    Liu, Yu
    Wei, Hongkui
    Shan, Shimin
    Zhao, Zhehuan
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (08)