Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data

被引:0
|
作者
Feng, Tuo [1 ]
Wang, Wenguan [2 ]
Quan, Ruijie [2 ]
Yang, Yi [2 ]
机构
[1] Univ Technol Sydney, AAII, ReLER, Sydney, NSW, Australia
[2] Zhejiang Univ, CCAI, ReLER, Hangzhou, Peoples R China
来源
关键词
Self-supervised Learning; 3D Scene Data; 3D Shape Data;
D O I
10.1007/978-3-031-73001-6_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current 3D self-supervised learning methods of 3D scenes face a data desert issue, resulting from the time-consuming and expensive collecting process of 3D scene data. Conversely, 3D shape datasets are easier to collect. Despite this, existing pre-training strategies on shape data offer limited potential for 3D scene understanding due to significant disparities in point quantities. To tackle these challenges, we propose Shape2Scene (S2S), a novel method that learns representations of large-scale 3D scenes from 3D shape data. We first design multi-scale and high-resolution backbones for shape and scene level 3D tasks, i.e., MH-P (point-based) and MH-V (voxel-based). MH-P/V establishes direct paths to high-resolution features that capture deep semantic information across multiple scales. This pivotal nature makes them suitable for a wide range of 3D downstream tasks that tightly rely on high-resolution features. We then employ a Shape-to-Scene strategy (S2SS) to amalgamate points from various shapes, creating a random pseudo scene (comprising multiple objects) for training data, mitigating disparities between shapes and scenes. Finally, a point-point contrastive loss (PPC) is applied for the pre-training of MH-P/V. In PPC, the inherent correspondence (i.e., point pairs) is naturally obtained in S2SS. Extensive experiments have demonstrated the transferability of 3D representations learned by MH-P/V across shape-level and scene-level 3D tasks. MH-P achieves notable performance on well-known point cloud datasets (93.8% OA on ScanObjectNN and 87.6% instance mIoU on ShapeNetPart). MH-V also achieves promising performance in 3D semantic segmentation and 3D object detection.
引用
收藏
页码:73 / 91
页数:19
相关论文
共 50 条
  • [41] Learning Local Neighboring Structure for Robust 3D Shape Representation
    Gao, Zhongpai
    Yan, Junchi
    Zhai, Guangtao
    Zhang, Juyong
    Yang, Yiyan
    Yang, Xiaokang
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1397 - 1405
  • [42] Mosaic-based 3D scene representation and rendering
    Zhu, ZG
    Hanson, AR
    2005 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), VOLS 1-5, 2005, : 1209 - 1212
  • [43] Generative Range Imaging for Learning Scene Priors of 3D LiDAR Data
    Nakashima, Kazuto
    Iwashita, Yumi
    Kurazume, Ryo
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1256 - 1266
  • [44] Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts
    Hou, Ji
    Graham, Benjamin
    Niesner, Matthias
    Xie, Saining
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15582 - 15592
  • [45] 3D Scene Generation by Learning from Examples
    Dema, Mesfin A.
    Sari-Sarraf, Hamed
    2012 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2012, : 58 - 64
  • [46] Texture synthesis for 3D shape representation
    Gorla, G
    Interrante, V
    Sapiro, G
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2003, 9 (04) : 512 - 524
  • [47] Medial Axis for 3D Shape Representation
    Qiu, Wei
    Sakai, Ko
    NEURAL INFORMATION PROCESSING, PT I, 2011, 7062 : 79 - +
  • [48] Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors
    Hou, Ji
    Dai, Xiaoliang
    He, Zijian
    Dai, Angela
    Niessner, Matthias
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13510 - 13519
  • [49] Learning to Exploit Stability for 3D Scene Parsing
    Du, Yilun
    Liu, Zhijian
    Basevi, Hector
    Leonardis, Ales
    Freeman, William T.
    Tenenbaum, Joshua B.
    Wu, Jiajun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [50] Unsupervised Feature Learning for 3D Scene Labeling
    Lai, Kevin
    Bo, Liefeng
    Fox, Dieter
    2014 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2014, : 3050 - 3057