Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data

被引:0
|
作者
Feng, Tuo [1 ]
Wang, Wenguan [2 ]
Quan, Ruijie [2 ]
Yang, Yi [2 ]
机构
[1] Univ Technol Sydney, AAII, ReLER, Sydney, NSW, Australia
[2] Zhejiang Univ, CCAI, ReLER, Hangzhou, Peoples R China
来源
关键词
Self-supervised Learning; 3D Scene Data; 3D Shape Data;
D O I
10.1007/978-3-031-73001-6_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current 3D self-supervised learning methods of 3D scenes face a data desert issue, resulting from the time-consuming and expensive collecting process of 3D scene data. Conversely, 3D shape datasets are easier to collect. Despite this, existing pre-training strategies on shape data offer limited potential for 3D scene understanding due to significant disparities in point quantities. To tackle these challenges, we propose Shape2Scene (S2S), a novel method that learns representations of large-scale 3D scenes from 3D shape data. We first design multi-scale and high-resolution backbones for shape and scene level 3D tasks, i.e., MH-P (point-based) and MH-V (voxel-based). MH-P/V establishes direct paths to high-resolution features that capture deep semantic information across multiple scales. This pivotal nature makes them suitable for a wide range of 3D downstream tasks that tightly rely on high-resolution features. We then employ a Shape-to-Scene strategy (S2SS) to amalgamate points from various shapes, creating a random pseudo scene (comprising multiple objects) for training data, mitigating disparities between shapes and scenes. Finally, a point-point contrastive loss (PPC) is applied for the pre-training of MH-P/V. In PPC, the inherent correspondence (i.e., point pairs) is naturally obtained in S2SS. Extensive experiments have demonstrated the transferability of 3D representations learned by MH-P/V across shape-level and scene-level 3D tasks. MH-P achieves notable performance on well-known point cloud datasets (93.8% OA on ScanObjectNN and 87.6% instance mIoU on ShapeNetPart). MH-V also achieves promising performance in 3D semantic segmentation and 3D object detection.
引用
收藏
页码:73 / 91
页数:19
相关论文
共 50 条
  • [1] Self-supervised Pre-training with Masked Shape Prediction for 3D Scene Understanding
    Jiang, Li
    Yang, Zetong
    Shi, Shaoshuai
    Golyanik, Vladislav
    Dai, Dengxin
    Schiele, Bernt
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1168 - 1178
  • [2] SCP: SCENE COMPLETION PRE-TRAINING FOR 3D OBJECT DETECTION
    Shan, Yiming
    Xia, Yan
    Chen, Yuhong
    Cremers, Daniel
    GEOSPATIAL WEEK 2023, VOL. 48-1, 2023, : 41 - 46
  • [3] Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding
    Zhang, Taolin
    He, Sunan
    Dai, Tao
    Wang, Zhi
    Chen, Bin
    Xia, Shu-Tao
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7296 - 7304
  • [4] 3D Scene Segmentation with a Shape Repository
    Wan, Lili
    Miao, Zhenjiang
    Chang, Dongxia
    Cen, Yigang
    2013 INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV 2013), 2013, : 249 - 252
  • [5] Learning to Recover 3D Scene Shape from a Single Image
    Yin, Wei
    Zhang, Jianming
    Wang, Oliver
    Niklaus, Simon
    Mai, Long
    Chen, Simon
    Shen, Chunhua
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 204 - 213
  • [6] Multi-Camera Unified Pre-Training via 3D Scene Reconstruction
    Min, Chen
    Xiao, Liang
    Zhao, Dawei
    Nie, Yiming
    Dai, Bin
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (04) : 3243 - 3250
  • [7] Robust Shape Fitting for 3D Scene Abstraction
    Kluger, Florian
    Brachmann, Eric
    Yang, Michael Ying
    Rosenhahn, Bodo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (09) : 6306 - 6325
  • [8] A comparison of methods for 3D scene shape retrieval
    Yuan, Juefei
    Abdul-Rashid, Hameed
    Li, Bo
    Lu, Yijuan
    Schreck, Tobias
    Bai, Song
    Bai, Xiang
    Ngoc-Minh Bui
    Do, Minh N.
    Trong-Le Do
    Anh-Duc Duong
    He, Kai
    He, Xinwei
    Holenderski, Mike
    Jarnikov, Dmitri
    Tu-Khiem Le
    Li, Wenhui
    Liu, Anan
    Liu, Xiaolong
    Menkovski, Vlado
    Khac-Tuan Nguyen
    Thanh-An Nguyen
    Vinh-Tiep Nguyen
    Nie, Weizhi
    Van-Tu Ninh
    Rey, Perez
    Su, Yuting
    Vinh Ton-That
    Tran, Minh-Triet
    Wang, Tianyang
    Xiang, Shu
    Zhe, Shandian
    Zhou, Heyu
    Zhou, Yang
    Zhou, Zhichao
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2020, 201
  • [9] Surgery Scene Representation in 3D Simulation Training SDK
    Ivaschenko, Anton
    Gorbachenko, Nikolay
    Kolsanov, Alexandr
    Kuzmin, Andrey
    2016 18TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION AND SEMINAR ON INFORMATION SECURITY AND PROTECTION OF INFORMATION TECHNOLOGY (FRUCT-ISPIT), 2016, : 75 - 84
  • [10] Making-a-Scene: A Preliminary Case Study on Speech-Based 3D Shape Exploration Through Scene Modeling
    Vyas, Shantanu
    Chen, Ting-Ju
    Mohanty, Ronak R.
    Krishnamurthy, Vinayak R.
    JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING, 2022, 22 (06)