Visual Reinforcement Learning With Self-Supervised 3D Representations

被引:10
|
作者
Ze, Yanjie [1 ,2 ]
Hansen, Nicklas [2 ]
Chen, Yinbo [2 ]
Jain, Mohit [2 ]
Wang, Xiaolong [2 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai 200240, Peoples R China
[2] Univ Calif San Diego, San Diego, CA 92093 USA
关键词
Three-dimensional displays; Task analysis; Visualization; Cameras; Representation learning; Training; Robot vision systems; Reinforcement learning; representation learning; deep learning for visual perception;
D O I
10.1109/LRA.2023.3259681
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
A prominent approach to visual Reinforcement Learning (RL) is to learn an internal state representation using self-supervised methods, which has the potential benefit of improved sample-efficiency and generalization through additional learning signal and inductive biases. However, while the real world is inherently 3D, prior efforts have largely been focused on leveraging 2D computer vision techniques as auxiliary self-supervision. In this work, we present a unified framework for self-supervised learning of 3D representations for motor control. Our proposed framework consists of two phases: a pretraining phase where a deep voxel-based 3D autoencoder is pretrained on a large object-centric dataset, and a finetuning phase where the representation is jointly finetuned together with RL on in-domain data. We empirically show that our method enjoys improved sample efficiency compared to 2D representation learning methods. Additionally, our learned policies transfer zero-shot to a real robot setup with only approximate geometric correspondence, and successfully solve motor control tasks that involve grasping and lifting from a single, uncalibrated RGB camera.
引用
收藏
页码:2890 / 2897
页数:8
相关论文
共 50 条
  • [31] PatchMixing Masked Autoencoders for 3D Point Cloud Self-Supervised Learning
    Lin, Chengxing
    Xu, Wenju
    Zhu, Jian
    Nie, Yongwei
    Cai, Ruichu
    Xu, Xuemiao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9882 - 9897
  • [32] SSL-Rehab: Assessment of physical rehabilitation exercises through self-supervised learning of 3D skeleton representations
    Kourbane, Ikram
    Papadakis, Panagiotis
    Andries, Mihai
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
  • [33] Self-supervised Secondary Landmark Detection via 3D Representation Learning
    Bala, Praneet
    Zimmermann, Jan
    Park, Hyun Soo
    Hayden, Benjamin Y.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (08) : 1980 - 1994
  • [34] Self-Supervised 3D Action Representation Learning With Skeleton Cloud Colorization
    Yang, Siyuan
    Liu, Jun
    Lu, Shijian
    Hwa, Er Meng
    Hu, Yongjian
    Kot, Alex C.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (01) : 509 - 524
  • [35] Self-supervised Adversarial Masking for 3D Point Cloud Representation Learning
    Szachniewicz, Michal
    Kozlowski, Wojciech
    Stypulkowski, Michal
    Zieba, Maciej
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 14796 : 156 - 168
  • [36] Modeling the Uncertainty for Self-supervised 3D Skeleton Action Representation Learning
    Su, Yukun
    Lin, Guosheng
    Sun, Ruizhou
    Hao, Yun
    Wu, Qingyao
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 769 - 778
  • [37] Self-supervised Secondary Landmark Detection via 3D Representation Learning
    Praneet Bala
    Jan Zimmermann
    Hyun Soo Park
    Benjamin Y. Hayden
    International Journal of Computer Vision, 2023, 131 : 1980 - 1994
  • [38] Consistent 3D Hand Reconstruction in Video via Self-Supervised Learning
    Tu, Zhigang
    Huang, Zhisheng
    Chen, Yujin
    Kang, Di
    Bao, Linchao
    Yang, Bisheng
    Yuan, Junsong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 9469 - 9485
  • [39] Self-supervised Learning for Sketch-Based 3D Shape Retrieval
    Chen, Zhixiang
    Zhao, Haifeng
    Zhang, Yan
    Sun, Guozi
    Wu, Tianjian
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, PRCV 2022, 2022, 13534 : 318 - 329
  • [40] Capturing implicit hierarchical structure in 3D biomedical images with self-supervised hyperbolic representations
    Hsu, Joy
    Gu, Jeffrey
    Wu, Gong Her
    Chiu, Wah
    Yeung, Serena
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34