Mutual information-driven self-supervised point cloud pre-training

被引:0
|
作者
Xu, Weichen [1 ]
Fu, Tianhao [1 ]
Cao, Jian [1 ]
Zhao, Xinyu [1 ]
Xu, Xinxin [1 ]
Cao, Xixin [1 ]
Zhang, Xing [1 ,2 ]
机构
[1] Peking Univ, Sch Software & Microelect, Beijing 100871, Peoples R China
[2] Peking Univ, Shenzhen Grad Sch, Key Lab Integrated Microsyst, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Self-supervised learning; Autonomous driving; Point cloud scene understanding; Mutual information; High-level features; OPTIMIZATION;
D O I
10.1016/j.knosys.2024.112741
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning universal representations from unlabeled 3D point clouds is essential to improve the generalization and safety of autonomous driving. Generative self-supervised point cloud pre-training with low-level features as pretext tasks is a mainstream paradigm. However, from the perspective of mutual information, this approach is constrained by spatial information and entangled representations. In this study, we propose a generalized generative self-supervised point cloud pre-training framework called GPICTURE. High-level features were used as an additional pretext task to enhance the understanding of semantic information. Considering the varying difficulties caused by the discrimination of voxel features, we designed inter-class and intra-class discrimination-guided masking (I2Mask) to set the masking ratio adaptively. Furthermore, to ensure a hierarchical and stable reconstruction process, centered kernel alignment-guided hierarchical reconstruction and differential-gated progressive learning were employed to control multiple reconstruction tasks. Complete theoretical analyses demonstrated that high-level features can enhance the mutual information between latent features and high-level features, as well as the input point cloud. On Waymo, nuScenes, and SemanticKITTI, we achieved a 75.55% mAP for 3D object detection, 79.7% mIoU for 3D semantic segmentation, and 18.8% mIoU for occupancy prediction. Specifically, with only 50% of the fine-tuning data required, the performance of GPICURE was close to that of training from scratch with 100% of the fine-tuning data. In addition, consistent visualization with downstream tasks and a 57% reduction in weight disparity demonstrated a better fine-tuning starting point. The project page is hosted at https://gpicture-page.github.io/.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] ENHANCING THE DOMAIN ROBUSTNESS OF SELF-SUPERVISED PRE-TRAINING WITH SYNTHETIC IMAGES
    Hassan, Mohamad N. C.
    Bhattacharya, Avigyan
    da Costa, Victor G. Turrisi
    Banerjee, Biplab
    Ricci, Elisa
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5470 - 5474
  • [32] Individualized Stress Mobile Sensing Using Self-Supervised Pre-Training
    Islam, Tanvir
    Washington, Peter
    APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [33] Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training
    Huang, Sung-Feng
    Chuang, Shun-Po
    Liu, Da-Rong
    Chen, Yi-Chen
    Yang, Gene-Ping
    Lee, Hung-yi
    INTERSPEECH 2021, 2021, : 3056 - 3060
  • [34] Self-Supervised Pre-training for Protein Embeddings Using Tertiary Structures
    Guo, Yuzhi
    Wu, Jiaxiang
    Ma, Hehuan
    Huang, Junzhou
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6801 - 6809
  • [35] Progressive self-supervised learning: A pre-training method for crowd counting
    Gu, Yao
    Zheng, Zhe
    Wu, Yingna
    Xie, Guangping
    Ni, Na
    PATTERN RECOGNITION LETTERS, 2025, 188 : 148 - 154
  • [36] DialogueBERT: A Self-Supervised Learning based Dialogue Pre-training Encoder
    Zhang, Zhenyu
    Guo, Tao
    Chen, Meng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3647 - 3651
  • [37] SslTransT: Self-supervised pre-training visual object tracking with Transformers
    Cai, Yannan
    Tan, Ke
    Wei, Zhenzhong
    OPTICS COMMUNICATIONS, 2024, 557
  • [38] Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering
    Yang, Yaming
    Guan, Ziyu
    Wang, Zhe
    Zhao, Wei
    Xu, Cai
    Lu, Weigang
    Huang, Jianbin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [39] GUIDED CONTRASTIVE SELF-SUPERVISED PRE-TRAINING FOR AUTOMATIC SPEECH RECOGNITION
    Khare, Aparna
    Wu, Minhua
    Bhati, Saurabhchand
    Droppo, Jasha
    Maas, Roland
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 174 - 181
  • [40] Class incremental learning with self-supervised pre-training and prototype learning
    Liu, Wenzhuo
    Wu, Xin-Jian
    Zhu, Fei
    Yu, Ming-Ming
    Wang, Chuang
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2025, 157