Self-Supervised Exploration via Disagreement

被引:0
|
作者
Pathak, Deepak [1 ]
Gandhi, Dhiraj [2 ]
Gupta, Abhinav [2 ,3 ]
机构
[1] UC Berkelely, Berkeley, CA 94720 USA
[2] CMU, Pittsburgh, PA USA
[3] Facebook AI Res, Menlo Pk, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Efficient exploration is a long-standing problem in sensorimotor learning. Major advances have been demonstrated in noise-free, non-stochastic domains such as video games and simulation. However, most of these formulations either get stuck in environments with stochastic dynamics or are too inefficient to be scalable to real robotics setups. In this paper, we propose a formulation for exploration inspired by the work in active learning literature. Specifically, we train an ensemble of dynamics models and incentivize the agent to explore such that the disagreement of those ensembles is maximized This allows the agent to learn skills by exploring in a self-supervised manner without any external reward. Notably, we further leverage the disagreement objective to optimize the agent's policy in a differentiable manner, without using reinforcement learning, which results in a sample-efficient exploration. We demonstrate the efficacy of this formulation across a variety of benchmark environments including stochastic-Atari, Mujoco and Unity. Finally, we implement our differentiable exploration on a real robot which learns to interact with objects completely from scratch. Project videos and code are at https://pathak22.github. io/exploration-by-disagreement/.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Self-supervised sub-category exploration for Pseudo label generation
    Chern, Wei-Chih
    Kim, Taegeon
    Nguyen, Tam, V
    Asari, Vijayan K.
    Kim, Hongjo
    AUTOMATION IN CONSTRUCTION, 2023, 151
  • [32] Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation
    Liang, Jiachen
    Hou, Ruibing
    Chang, Hong
    Ma, Bingpeng
    Shan, Shiguang
    Chen, Xilin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [33] Exploration of distributed self-supervised training optimization strategies in visual tasks
    Zhang, Xi
    Wang, Bo
    Chen, Jiangqi
    Wang, Jin
    Chen, Xia
    INTERNATIONAL JOURNAL OF LOW-CARBON TECHNOLOGIES, 2024, 19 : 2667 - 2675
  • [34] EXPLORATION OF LANGUAGE DEPENDENCY FOR JAPANESE SELF-SUPERVISED SPEECH REPRESENTATION MODELS
    Ashihara, Takanori
    Moriya, Takafumi
    Matsuura, Kohei
    Tanaka, Tomohiro
    arXiv, 2023,
  • [35] Monocular Depth Estimation via Self-Supervised Self-Distillation
    Hu, Haifeng
    Feng, Yuyang
    Li, Dapeng
    Zhang, Suofei
    Zhao, Haitao
    SENSORS, 2024, 24 (13)
  • [36] Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization
    Gur, Shir
    Ali, Ameen
    Wolf, Lior
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 11545 - 11554
  • [37] Self-supervised AutoFlow
    Huang, Hsin-Ping
    Herrmann, Charles
    Hur, Junhwa
    Lu, Erika
    Sargent, Kyle
    Stone, Austin
    Yang, Ming-Hsuan
    Sun, Deqing
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11412 - 11421
  • [38] Self-supervised ARTMAP
    Amis, Gregory P.
    Carpenter, Gail A.
    NEURAL NETWORKS, 2010, 23 (02) : 265 - 282
  • [39] SELF-SUPERVISED SPEAKER VERIFICATION WITH SIMPLE SIAMESE NETWORK AND SELF-SUPERVISED REGULARIZATION
    Sang, Mufan
    Li, Haoqi
    Liu, Fang
    Arnold, Andrew O.
    Wan, Li
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6127 - 6131
  • [40] Audio Mixing Inversion via Embodied Self-supervised Learning
    Zhou, Haotian
    Yu, Feng
    Wu, Xihong
    MACHINE INTELLIGENCE RESEARCH, 2024, 21 (01) : 55 - 62