Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data

被引:0
|
作者
Feng, Tuo [1 ]
Wang, Wenguan [2 ]
Quan, Ruijie [2 ]
Yang, Yi [2 ]
机构
[1] Univ Technol Sydney, AAII, ReLER, Sydney, NSW, Australia
[2] Zhejiang Univ, CCAI, ReLER, Hangzhou, Peoples R China
来源
关键词
Self-supervised Learning; 3D Scene Data; 3D Shape Data;
D O I
10.1007/978-3-031-73001-6_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current 3D self-supervised learning methods of 3D scenes face a data desert issue, resulting from the time-consuming and expensive collecting process of 3D scene data. Conversely, 3D shape datasets are easier to collect. Despite this, existing pre-training strategies on shape data offer limited potential for 3D scene understanding due to significant disparities in point quantities. To tackle these challenges, we propose Shape2Scene (S2S), a novel method that learns representations of large-scale 3D scenes from 3D shape data. We first design multi-scale and high-resolution backbones for shape and scene level 3D tasks, i.e., MH-P (point-based) and MH-V (voxel-based). MH-P/V establishes direct paths to high-resolution features that capture deep semantic information across multiple scales. This pivotal nature makes them suitable for a wide range of 3D downstream tasks that tightly rely on high-resolution features. We then employ a Shape-to-Scene strategy (S2SS) to amalgamate points from various shapes, creating a random pseudo scene (comprising multiple objects) for training data, mitigating disparities between shapes and scenes. Finally, a point-point contrastive loss (PPC) is applied for the pre-training of MH-P/V. In PPC, the inherent correspondence (i.e., point pairs) is naturally obtained in S2SS. Extensive experiments have demonstrated the transferability of 3D representations learned by MH-P/V across shape-level and scene-level 3D tasks. MH-P achieves notable performance on well-known point cloud datasets (93.8% OA on ScanObjectNN and 87.6% instance mIoU on ShapeNetPart). MH-V also achieves promising performance in 3D semantic segmentation and 3D object detection.
引用
收藏
页码:73 / 91
页数:19
相关论文
共 50 条
  • [21] Self-Supervised Pre-Training Boosts Semantic Scene Segmentation on LiDAR data
    Caros, Mariona
    Just, Ariadna
    Segui, Santi
    Vitria, Jordi
    2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
  • [22] Weakly supervised learning of multi-object 3D scene decompositions using deep shape priors
    Elich, Cathrin
    Oswald, Martin R.
    Pollefeys, Marc
    Stueckler, Joerg
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 220
  • [23] A Unified Feature Representation and Learning Framework for 3D Shape
    Mu, Panpan
    Zhang, Sanyuan
    Pan, Xiang
    Hong, Zhenjie
    CHINESE JOURNAL OF ELECTRONICS, 2019, 28 (05) : 993 - 999
  • [24] 3D Shape Contrastive Representation Learning With Adversarial Examples
    Wen, Congcong
    Li, Xiang
    Huang, Hao
    Liu, Yu-Shen
    Fang, Yi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 679 - 692
  • [25] A Unified Feature Representation and Learning Framework for 3D Shape
    MU Panpan
    ZHANG Sanyuan
    PAN Xiang
    HONG Zhenjie
    ChineseJournalofElectronics, 2019, 28 (05) : 993 - 999
  • [26] Learning 3D Scene Priors with 2D Supervision
    Nie, Yinyu
    Dai, Angela
    Han, Xiaoguang
    Niessner, Matthias
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 792 - 802
  • [27] Towards Accurate Reconstruction of 3D Scene Shape From A Single Monocular Image
    Yin, Wei
    Zhang, Jianming
    Wang, Oliver
    Niklaus, Simon
    Chen, Simon
    Liu, Yifan
    Shen, Chunhua
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (05) : 6480 - 6494
  • [28] 3D SHAPE REPRESENTATION BY CONTOURS
    WEISS, I
    COMPUTER VISION GRAPHICS AND IMAGE PROCESSING, 1988, 41 (01): : 80 - 100
  • [29] SFGAN: Unsupervised Generative Adversarial Learning of 3D Scene Flow from the 3D Scene Self
    Wang, Guangming
    Jiang, Chaokang
    Shen, Zehang
    Miao, Yanzi
    Wang, Hesheng
    ADVANCED INTELLIGENT SYSTEMS, 2022, 4 (04)
  • [30] Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-Training
    Gao, Yipeng
    Wang, Zeyu
    Zheng, Wei-Shi
    Xie, Cihang
    Zhou, Yuyin
    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2024, : 22998 - 23008