Point-to-Pixel Prompting for Point Cloud Analysis With Pre-Trained Image Models

被引：4

作者：

Wang, Ziyi ^{[1
]}

Rao, Yongming ^{[1
]}

Yu, Xumin ^{[1
]}

Zhou, Jie ^{[1
]}

Lu, Jiwen ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Point cloud compression; Three-dimensional displays; Task analysis; Solid modeling; Tuning; Analytical models; Feature extraction; Distillation; point cloud; prompt tuning;

D O I：

10.1109/TPAMI.2024.3354961

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Nowadays, pre-training big models on large-scale datasets has achieved great success and dominated many downstream tasks in natural language processing and 2D vision, while pre-training in 3D vision is still under development. In this paper, we provide a new perspective of transferring the pre-trained knowledge from 2D domain to 3D domain with Point-to-Pixel Prompting in data space and Pixel-to-Point distillation in feature space, exploiting shared knowledge in images and point clouds that display the same visual world. Following the principle of prompting engineering, Point-to-Pixel Prompting transforms point clouds into colorful images with geometry-preserved projection and geometry-aware coloring. Then the pre-trained image models can be directly implemented for point cloud tasks without structural changes or weight modifications. With projection correspondence in feature space, Pixel-to-Point distillation further regards pre-trained image models as the teacher model and distills pre-trained 2D knowledge to student point cloud models, remarkably enhancing inference efficiency and model capacity for point cloud analysis. We conduct extensive experiments in both object classification and scene segmentation under various settings to demonstrate the superiority of our method. In object classification, we reveal the important scale-up trend of Point-to-Pixel Prompting and attain 90.3% accuracy on ScanObjectNN dataset, surpassing previous literature by a large margin. In scene-level semantic segmentation, our method outperforms traditional 3D analysis approaches and shows competitive capacity in dense prediction tasks.

引用

页码：4381 / 4397

页数：17

共 50 条

[21] Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders
Zhang, Renrui
Wang, Liuhui
Qiao, Yu
Gao, Peng
Li, Hongsheng
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21769 - 21780
[22] DVPT: Dynamic Visual Prompt Tuning of large pre-trained models for medical image analysis
He, Along
Wu, Yanlin
Wang, Zhihong
Li, Tao
Fu, Huazhu
NEURAL NETWORKS, 2025, 185
[23] Malware image classification: comparative analysis of a fine-tuned CNN and pre-trained models
Majhi S.K.
Panda A.
Srichandan S.K.
Desai U.
Acharya B.
International Journal of Computers and Applications, 2023, 45 (11) : 709 - 721
[24] Vulnerability Analysis of Continuous Prompts for Pre-trained Language Models
Li, Zhicheng
Shi, Yundi
Sheng, Xuan
Yin, Changchun
Zhou, Lu
Li, Piji
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IX, 2023, 14262 : 508 - 519
[25] EFFICIENT TEXT ANALYSIS WITH PRE-TRAINED NEURAL NETWORK MODELS
Cui, Jia
Lu, Heng
Wang, Wenjie
Kang, Shiyin
He, Liqiang
Li, Guangzhi
Yu, Dong
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 671 - 676
[26] An Application of pre-Trained CNN for Image Classification
Abdullah
Hasan, Mohammad S.
2017 20TH INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2017,
[27] Quality-aware Pre-trained Models for Blind Image Quality Assessment
Zhao, Kai
Yuan, Kun
Sun, Ming
Li, Mading
Wen, Xing
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22302 - 22313
[28] Face Inpainting with Pre-trained Image Transformers
Gonc, Kaan
Saglam, Baturay
Kozat, Suleyman S.
Dibeklioglu, Hamdi
2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
[29] ApmNet: Toward Generalizable Visual Continuous Control with Pre-trained Image Models
Wang, Haitao
Wu, Hejun
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT III, ECML PKDD 2024, 2024, 14943 : 127 - 142
[30] SAR Image Despeckling Using Pre-trained Convolutional Neural Network Models
Yang, Xiangli
Denis, Loic
Tupin, Florence
Yang, Wen
2019 JOINT URBAN REMOTE SENSING EVENT (JURSE), 2019,

← 1 2 3 4 5 →