Self-Supervised Exploration via Disagreement

被引：0

作者：

Pathak, Deepak ^{[1
]}

Gandhi, Dhiraj ^{[2
]}

Gupta, Abhinav ^{[2
,3
]}

机构：

[1] UC Berkelely, Berkeley, CA 94720 USA

[2] CMU, Pittsburgh, PA USA

[3] Facebook AI Res, Menlo Pk, CA USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97 | 2019年 / 97卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Efficient exploration is a long-standing problem in sensorimotor learning. Major advances have been demonstrated in noise-free, non-stochastic domains such as video games and simulation. However, most of these formulations either get stuck in environments with stochastic dynamics or are too inefficient to be scalable to real robotics setups. In this paper, we propose a formulation for exploration inspired by the work in active learning literature. Specifically, we train an ensemble of dynamics models and incentivize the agent to explore such that the disagreement of those ensembles is maximized This allows the agent to learn skills by exploring in a self-supervised manner without any external reward. Notably, we further leverage the disagreement objective to optimize the agent's policy in a differentiable manner, without using reinforcement learning, which results in a sample-efficient exploration. We demonstrate the efficacy of this formulation across a variety of benchmark environments including stochastic-Atari, Mujoco and Unity. Finally, we implement our differentiable exploration on a real robot which learns to interact with objects completely from scratch. Project videos and code are at https://pathak22.github. io/exploration-by-disagreement/.

引用

页数：10

共 50 条

[41] Progressive Video Summarization via Multimodal Self-supervised Learning
Li, Haopeng
Ke, Qiuhong
Gong, Mingming
Drummond, Tom
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5573 - 5582
[42] Affinity Learning Via Self-Supervised Diffusion for Spectral Clustering
Ye, Jianfeng
Li, Qilin
Yu, Jinlong
Wang, Xincheng
Wang, Huaming
IEEE ACCESS, 2021, 9 : 7170 - 7182
[43] Repeatable adaptive keypoint detection via self-supervised learning
Pei Yan
Yihua Tan
Yuan Tai
Science China Information Sciences, 2022, 65
[44] Self-Supervised Learning of Point Clouds via Orientation Estimation
Poursaeed, Omid
Jiang, Tianxing
Qiao, Han
Xu, Nayun
Kim, Vladimir G.
2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 1018 - 1028
[45] Self-Supervised Representation Learning via Latent Graph Prediction
Xie, Yaochen
Xu, Zhao
Ji, Shuiwang
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[46] Self-Supervised Graph Representation Learning via Topology Transformations
Gao, Xiang
Hu, Wei
Qi, Guo-Jun
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 4202 - 4215
[47] Accelerating Self-Supervised Learning via Efficient Training Strategies
Kocyigit, Mustafa Taha
Hospedales, Timothy M.
Bilen, Hakan
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5643 - 5653
[48] Efficient Medical Image Assessment via Self-supervised Learning
Huang, Chun-Yin
Lei, Qi
Li, Xiaoxiao
DATA AUGMENTATION, LABELLING, AND IMPERFECTIONS (DALI 2022), 2022, 13567 : 102 - 111
[49] Predicting Human Mobility via Self-Supervised Disentanglement Learning
Gao, Qiang
Hong, Jinyu
Xu, Xovee
Kuang, Ping
Zhou, Fan
Trajcevski, Goce
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (05) : 2126 - 2141
[50] A self-supervised entity alignment framework via attribute correction
Zhang, Xin
Liu, Yu
Wei, Hongkui
Shan, Shimin
Zhao, Zhehuan
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (08)

← 1 2 3 4 5 →