Point-MPP: Point Cloud Self-Supervised Learning From Masked Position Prediction

被引:0
|
作者
Fan, Songlin [1 ,2 ]
Gao, Wei [1 ,2 ]
Li, Ge [1 ]
机构
[1] Peking Univ, Sch Elect & Comp Engn, Shenzhen 518055, Peoples R China
[2] Peng Cheng Lab, Sch Elect & Comp Engn, Shenzhen 518066, Peoples R China
关键词
Point cloud compression; Semantics; Transformers; Standards; Feature extraction; Training; Circuit faults; Predictive models; Encoding; Image reconstruction; Masked position prediction; point cloud; pretraining; self-supervised learning (SSL);
D O I
10.1109/TNNLS.2024.3479309
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Masked autoencoding has gained momentum for improving fine-tuning performance in many downstream tasks. However, it tends to focus on low-level reconstruction details, lacking high-level semantics and resulting in weak transfer capability. This article presents a novel jigsaw puzzle solver inspired by the idea that predicting the positions of disordered point cloud patches provides more semantic information, similar to how children learn by solving jigsaw puzzles. Our method adopts the mask-then-predict paradigm, erasing the positions of selected point patches rather than their contents. We first partition input point clouds into irregular patches and randomly erase the positions of some patches. Then, a Transformer-based model is used to learn high-level semantic features and regress the positions of the masked patches. This approach forces the model to focus on learning transfer-robust semantics while paying less attention to low-level details. To tie the predictions within the encoding space, we further introduce a consistency constraint on their latent representations to encourage the encoded features to contain more semantic cues. We demonstrate that a standard Transformer backbone with our pretraining scheme can capture discriminative point cloud semantic information. Furthermore, extensive experiments indicate that our method outperforms the previous best competitor across six popular downstream vision tasks, achieving new state-of-the-art performance.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Denoise yourself: Self-supervised point cloud upsampling with pretrained denoising
    Hur, Ji-Hyeon
    Kwon, Soonjo
    Kim, Hyungki
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 271
  • [42] Self-supervised multi-echo point cloud denoising in snowfall
    Seppanen, Alvari
    Ojala, Risto
    Tammi, Kari
    PATTERN RECOGNITION LETTERS, 2024, 185 : 52 - 58
  • [43] Self-Supervised Pretraining for Point Cloud Object Detection in Autonomous Driving
    Shi, Weijing
    Rajkumar, Ragunathan
    2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, : 4341 - 4348
  • [44] SITF: A Self-Supervised Iterative Training Framework for Point Cloud Denoising
    Su, Zhiyong
    Wang, Changchang
    Jiang, Kun
    Jiang, Kai
    Li, Weiqing
    COMPUTER-AIDED DESIGN, 2025, 179
  • [45] Self-Supervised Learning of Point Clouds via Orientation Estimation
    Poursaeed, Omid
    Jiang, Tianxing
    Qiao, Han
    Xu, Nayun
    Kim, Vladimir G.
    2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 1018 - 1028
  • [46] Self-Supervised Deep Learning on Point Clouds by Reconstructing Space
    Sauder, Jonathan
    Sievers, Bjarne
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [47] Self-Supervised Few-Shot Learning on Point Clouds
    Sharma, Charu
    Kaul, Manohar
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [48] HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
    Hsu, Wei-Ning
    Bolte, Benjamin
    Tsai, Yao-Hung Hubert
    Lakhotia, Kushal
    Salakhutdinov, Ruslan
    Mohamed, Abdelrahman
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3451 - 3460
  • [49] A Snapshot-based Approach for Self-supervised Feature Learning and Weakly-supervised Classification on Point Cloud Data
    Li, Xingye
    Zhu, Zhigang
    VISAPP: PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL. 5: VISAPP, 2021, : 399 - 408
  • [50] Masked Motion Prediction with Semantic Contrast for Point Cloud Sequence Learning
    Han, Yuehui
    Xu, Can
    Xu, Rui
    Qian, Jianjun
    Xie, Jin
    COMPUTER VISION - ECCV 2024, PT LXXVI, 2025, 15134 : 414 - 431