Self-Supervised Exploration via Disagreement

被引：0

作者：

Pathak, Deepak ^{[1
]}

Gandhi, Dhiraj ^{[2
]}

Gupta, Abhinav ^{[2
,3
]}

机构：

[1] UC Berkelely, Berkeley, CA 94720 USA

[2] CMU, Pittsburgh, PA USA

[3] Facebook AI Res, Menlo Pk, CA USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97 | 2019年 / 97卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Efficient exploration is a long-standing problem in sensorimotor learning. Major advances have been demonstrated in noise-free, non-stochastic domains such as video games and simulation. However, most of these formulations either get stuck in environments with stochastic dynamics or are too inefficient to be scalable to real robotics setups. In this paper, we propose a formulation for exploration inspired by the work in active learning literature. Specifically, we train an ensemble of dynamics models and incentivize the agent to explore such that the disagreement of those ensembles is maximized This allows the agent to learn skills by exploring in a self-supervised manner without any external reward. Notably, we further leverage the disagreement objective to optimize the agent's policy in a differentiable manner, without using reinforcement learning, which results in a sample-efficient exploration. We demonstrate the efficacy of this formulation across a variety of benchmark environments including stochastic-Atari, Mujoco and Unity. Finally, we implement our differentiable exploration on a real robot which learns to interact with objects completely from scratch. Project videos and code are at https://pathak22.github. io/exploration-by-disagreement/.

引用

页数：10

共 50 条

[31] Self-supervised sub-category exploration for Pseudo label generation
Chern, Wei-Chih
Kim, Taegeon
Nguyen, Tam, V
Asari, Vijayan K.
Kim, Hongjo
AUTOMATION IN CONSTRUCTION, 2023, 151
[32] Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation
Liang, Jiachen
Hou, Ruibing
Chang, Hong
Ma, Bingpeng
Shan, Shiguang
Chen, Xilin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[33] Exploration of distributed self-supervised training optimization strategies in visual tasks
Zhang, Xi
Wang, Bo
Chen, Jiangqi
Wang, Jin
Chen, Xia
INTERNATIONAL JOURNAL OF LOW-CARBON TECHNOLOGIES, 2024, 19 : 2667 - 2675
[34] EXPLORATION OF LANGUAGE DEPENDENCY FOR JAPANESE SELF-SUPERVISED SPEECH REPRESENTATION MODELS
Ashihara, Takanori
Moriya, Takafumi
Matsuura, Kohei
Tanaka, Tomohiro
arXiv, 2023,
[35] Monocular Depth Estimation via Self-Supervised Self-Distillation
Hu, Haifeng
Feng, Yuyang
Li, Dapeng
Zhang, Suofei
Zhao, Haitao
SENSORS, 2024, 24 (13)
[36] Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization
Gur, Shir
Ali, Ameen
Wolf, Lior
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 11545 - 11554
[37] Self-supervised AutoFlow
Huang, Hsin-Ping
Herrmann, Charles
Hur, Junhwa
Lu, Erika
Sargent, Kyle
Stone, Austin
Yang, Ming-Hsuan
Sun, Deqing
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11412 - 11421
[38] Self-supervised ARTMAP
Amis, Gregory P.
Carpenter, Gail A.
NEURAL NETWORKS, 2010, 23 (02) : 265 - 282
[39] SELF-SUPERVISED SPEAKER VERIFICATION WITH SIMPLE SIAMESE NETWORK AND SELF-SUPERVISED REGULARIZATION
Sang, Mufan
Li, Haoqi
Liu, Fang
Arnold, Andrew O.
Wan, Li
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6127 - 6131
[40] Audio Mixing Inversion via Embodied Self-supervised Learning
Zhou, Haotian
Yu, Feng
Wu, Xihong
MACHINE INTELLIGENCE RESEARCH, 2024, 21 (01) : 55 - 62

← 1 2 3 4 5 →