Mutual information-driven self-supervised point cloud pre-training

被引：0

作者：

Xu, Weichen ^{[1
]}

Fu, Tianhao ^{[1
]}

Cao, Jian ^{[1
]}

Zhao, Xinyu ^{[1
]}

Xu, Xinxin ^{[1
]}

Cao, Xixin ^{[1
]}

Zhang, Xing ^{[1
,2
]}

机构：

[1] Peking Univ, Sch Software & Microelect, Beijing 100871, Peoples R China

[2] Peking Univ, Shenzhen Grad Sch, Key Lab Integrated Microsyst, Shenzhen 518055, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2025年 / 307卷

基金：

中国国家自然科学基金;

关键词：

Self-supervised learning; Autonomous driving; Point cloud scene understanding; Mutual information; High-level features; OPTIMIZATION;

D O I：

10.1016/j.knosys.2024.112741

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning universal representations from unlabeled 3D point clouds is essential to improve the generalization and safety of autonomous driving. Generative self-supervised point cloud pre-training with low-level features as pretext tasks is a mainstream paradigm. However, from the perspective of mutual information, this approach is constrained by spatial information and entangled representations. In this study, we propose a generalized generative self-supervised point cloud pre-training framework called GPICTURE. High-level features were used as an additional pretext task to enhance the understanding of semantic information. Considering the varying difficulties caused by the discrimination of voxel features, we designed inter-class and intra-class discrimination-guided masking (I2Mask) to set the masking ratio adaptively. Furthermore, to ensure a hierarchical and stable reconstruction process, centered kernel alignment-guided hierarchical reconstruction and differential-gated progressive learning were employed to control multiple reconstruction tasks. Complete theoretical analyses demonstrated that high-level features can enhance the mutual information between latent features and high-level features, as well as the input point cloud. On Waymo, nuScenes, and SemanticKITTI, we achieved a 75.55% mAP for 3D object detection, 79.7% mIoU for 3D semantic segmentation, and 18.8% mIoU for occupancy prediction. Specifically, with only 50% of the fine-tuning data required, the performance of GPICURE was close to that of training from scratch with 100% of the fine-tuning data. In addition, consistent visualization with downstream tasks and a 57% reduction in weight disparity demonstrated a better fine-tuning starting point. The project page is hosted at https://gpicture-page.github.io/.

引用

页数：16

共 50 条

[21] Masked Feature Prediction for Self-Supervised Visual Pre-Training
Wei, Chen
Fan, Haoqi
Xie, Saining
Wu, Chao-Yuan
Yuille, Alan
Feichtenhofer, Christoph
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14648 - 14658
[22] FALL DETECTION USING SELF-SUPERVISED PRE-TRAINING MODEL
Yhdego, Haben
Audette, Michel
Paolini, Christopher
PROCEEDINGS OF THE 2022 ANNUAL MODELING AND SIMULATION CONFERENCE (ANNSIM'22), 2022, : 361 - 371
[23] CDS: Cross-Domain Self-supervised Pre-training
Kim, Donghyun
Saito, Kuniaki
Oh, Tae-Hyun
Plummer, Bryan A.
Sclaroff, Stan
Saenko, Kate
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9103 - 9112
[24] SPAKT: A Self-Supervised Pre-TrAining Method for Knowledge Tracing
Ma, Yuling
Han, Peng
Qiao, Huiyan
Cui, Chaoran
Yin, Yilong
Yu, Dehu
IEEE ACCESS, 2022, 10 : 72145 - 72154
[25] Correlational Image Modeling for Self-Supervised Visual Pre-Training
Li, Wei
Xie, Jiahao
Loy, Chen Change
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15105 - 15115
[26] MEASURING THE IMPACT OF DOMAIN FACTORS IN SELF-SUPERVISED PRE-TRAINING
Sanabria, Ramon
Wei-Ning, Hsu
Alexei, Baevski
Auli, Michael
2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
[27] Contrastive Self-Supervised Pre-Training for Video Quality Assessment
Chen, Pengfei
Li, Leida
Wu, Jinjian
Dong, Weisheng
Shi, Guangming
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 458 - 471
[28] A Unified Visual Information Preservation Framework for Self-supervised Pre-Training in Medical Image Analysis
Zhou, Hong-Yu
Lu, Chixiang
Chen, Chaoqi
Yang, Sibei
Yu, Yizhou
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 8020 - 8035
[29] Token Boosting for Robust Self-Supervised Visual Transformer Pre-training
Li, Tianjiao
Foo, Lin Geng
Hu, Ping
Shang, Xindi
Rahmani, Hossein
Yuan, Zehuan
Liu, Jun
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 24027 - 24038
[30] Joint Encoder-Decoder Self-Supervised Pre-training for ASR
Arunkumar, A.
Umesh, S.
INTERSPEECH 2022, 2022, : 3418 - 3422

← 1 2 3 4 5 →