Point-to-Pixel Prompting for Point Cloud Analysis With Pre-Trained Image Models

被引：4

作者：

Wang, Ziyi ^{[1
]}

Rao, Yongming ^{[1
]}

Yu, Xumin ^{[1
]}

Zhou, Jie ^{[1
]}

Lu, Jiwen ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Point cloud compression; Three-dimensional displays; Task analysis; Solid modeling; Tuning; Analytical models; Feature extraction; Distillation; point cloud; prompt tuning;

D O I：

10.1109/TPAMI.2024.3354961

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Nowadays, pre-training big models on large-scale datasets has achieved great success and dominated many downstream tasks in natural language processing and 2D vision, while pre-training in 3D vision is still under development. In this paper, we provide a new perspective of transferring the pre-trained knowledge from 2D domain to 3D domain with Point-to-Pixel Prompting in data space and Pixel-to-Point distillation in feature space, exploiting shared knowledge in images and point clouds that display the same visual world. Following the principle of prompting engineering, Point-to-Pixel Prompting transforms point clouds into colorful images with geometry-preserved projection and geometry-aware coloring. Then the pre-trained image models can be directly implemented for point cloud tasks without structural changes or weight modifications. With projection correspondence in feature space, Pixel-to-Point distillation further regards pre-trained image models as the teacher model and distills pre-trained 2D knowledge to student point cloud models, remarkably enhancing inference efficiency and model capacity for point cloud analysis. We conduct extensive experiments in both object classification and scene segmentation under various settings to demonstrate the superiority of our method. In object classification, we reveal the important scale-up trend of Point-to-Pixel Prompting and attain 90.3% accuracy on ScanObjectNN dataset, surpassing previous literature by a large margin. In scene-level semantic segmentation, our method outperforms traditional 3D analysis approaches and shows competitive capacity in dense prediction tasks.

引用

页码：4381 / 4397

页数：17

共 50 条

[31] Pre-trained Diffusion Models for Plug-and-Play Medical Image Enhancement
Ma, Jun
Zhu, Yuanzhi
You, Chenyu
Wang, Bo
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT III, 2023, 14222 : 3 - 13
[32] From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader
Xu, Weiwen
Li, Xin
Zhang, Wenxuan
Zhou, Meng
Lam, Wai
Si, Luo
Bing, Lidong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[33] Prompting disentangled embeddings for knowledge graph completion with pre-trained language model
Geng, Yuxia
Chen, Jiaoyan
Zeng, Yuhang
Chen, Zhuo
Zhang, Wen
Pan, Jeff Z.
Wang, Yuxiang
Xu, Xiaoliang
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 268
[34] Annotating Columns with Pre-trained Language Models
Suhara, Yoshihiko
Li, Jinfeng
Li, Yuliang
Zhang, Dan
Demiralp, Cagatay
Chen, Chen
Tan, Wang-Chiew
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1493 - 1503
[35] Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models
Tang, Yiwen
Zhang, Ray
Guo, Zoey
Ma, Xianzheng
Zhao, Bin
Wang, Zhigang
Wang, Dong
Li, Xuelong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5171 - 5179
[36] Point Cloud Pre-training with Diffusion Models
Zheng, Xiao
Huang, Xiaoshui
Mei, Guofeng
Hou, Yuenan
Lyu, Zhaoyang
Dai, Bo
Ouyang, Wanli
Gong, Yongshun
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 22935 - 22945
[37] Image Priors Assisted Pre-training for Point Cloud Shape Analysis
Li, Zhengyu
Wu, Yao
Qu, Yanyun
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 133 - 145
[38] Lottery Jackpots Exist in Pre-Trained Models
Zhang, Yuxin
Lin, Mingbao
Zhong, Yunshan
Chao, Fei
Ji, Rongrong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 14990 - 15004
[39] Interpreting Art by Leveraging Pre-Trained Models
Penzel, Niklas
Denzler, Joachim
2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
[40] LaoPLM: Pre-trained Language Models for Lao
Lin, Nankai
Fu, Yingwen
Yang, Ziyu
Chen, Chuwei
Jiang, Shengyi
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6506 - 6512

← 1 2 3 4 5 →