Self-supervised Pre-training with Masked Shape Prediction for 3D Scene Understanding

被引：2

作者：

Jiang, Li ^{[1
]}

Yang, Zetong ^{[2
]}

Shi, Shaoshuai ^{[1
]}

Golyanik, Vladislav ^{[1
]}

Dai, Dengxin ^{[1
]}

Schiele, Bernt ^{[1
]}

机构：

[1] Saarland Informatics Campus, Max Planck Inst Informat, Saarbrucken, Germany

[2] CUHK, Hong Kong, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.00119

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Masked signal modeling has greatly advanced self-supervised pre-training for language and 2D images. However, it is still not fully explored in 3D scene understanding. Thus, this paper introduces Masked Shape Prediction (MSP), a new framework to conduct masked signal modeling in 3D scenes. MSP uses the essential 3D semantic cue, i.e., geometric shape, as the prediction target for masked points. The context-enhanced shape target consisting of explicit shape context and implicit deep shape feature is proposed to facilitate exploiting contextual cues in shape prediction. Meanwhile, the pre-training architecture in MSP is carefully designed to alleviate the masked shape leakage from point coordinates. Experiments on multiple 3D understanding tasks on both indoor and outdoor datasets demonstrate the effectiveness of MSP in learning good feature representations to consistently boost downstream performance.

引用

页码：1168 / 1178

页数：11

共 50 条

[41] SPAKT: A Self-Supervised Pre-TrAining Method for Knowledge Tracing
Ma, Yuling
Han, Peng
Qiao, Huiyan
Cui, Chaoran
Yin, Yilong
Yu, Dehu
IEEE ACCESS, 2022, 10 : 72145 - 72154
[42] Correlational Image Modeling for Self-Supervised Visual Pre-Training
Li, Wei
Xie, Jiahao
Loy, Chen Change
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15105 - 15115
[43] MEASURING THE IMPACT OF DOMAIN FACTORS IN SELF-SUPERVISED PRE-TRAINING
Sanabria, Ramon
Wei-Ning, Hsu
Alexei, Baevski
Auli, Michael
2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
[44] Contrastive Self-Supervised Pre-Training for Video Quality Assessment
Chen, Pengfei
Li, Leida
Wu, Jinjian
Dong, Weisheng
Shi, Guangming
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 458 - 471
[45] Uni4Eye: Unified 2D and 3D Self-supervised Pre-training via Masked Image Modeling Transformer for Ophthalmic Image Classification
Cai, Zhiyuan
Lin, Li
He, Huaqing
Tang, Xiaoying
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VIII, 2022, 13438 : 88 - 98
[46] Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding
Zhang, Taolin
He, Sunan
Dai, Tao
Wang, Zhi
Chen, Bin
Xia, Shu-Tao
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7296 - 7304
[47] MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
Xu, Runsen
Wang, Tai
Zhang, Wenwei
Chen, Runjian
Cao, Jinkun
Pang, Jiangmiao
Lin, Dahua
arXiv, 2023,
[48] GO-MAE: Self-supervised pre-training via masked autoencoder for OCT image classification of gynecology
Wang, Haoran
Guo, Xinyu
Song, Kaiwen
Sun, Mingyang
Shao, Yanbin
Xue, Songfeng
Zhang, Hongwei
Zhang, Tianyu
NEURAL NETWORKS, 2025, 181
[49] Intra-modality masked image modeling: A self-supervised pre-training method for brain tumor segmentation
Qi, Liangce
Shi, Weili
Miao, Yu
Li, Yonghui
Feng, Guanyuan
Jiang, Zhengang
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 95
[50] MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
Xu, Runsen
Wang, Tai
Zhang, Wenwei
Chen, Runjian
Cao, Jinkun
Pang, Jiangmiao
Lin, Dahua
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13445 - 13454

← 1 2 3 4 5 →