Point-to-Pixel Prompting for Point Cloud Analysis With Pre-Trained Image Models

被引:4
|
作者
Wang, Ziyi [1 ]
Rao, Yongming [1 ]
Yu, Xumin [1 ]
Zhou, Jie [1 ]
Lu, Jiwen [1 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Point cloud compression; Three-dimensional displays; Task analysis; Solid modeling; Tuning; Analytical models; Feature extraction; Distillation; point cloud; prompt tuning;
D O I
10.1109/TPAMI.2024.3354961
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, pre-training big models on large-scale datasets has achieved great success and dominated many downstream tasks in natural language processing and 2D vision, while pre-training in 3D vision is still under development. In this paper, we provide a new perspective of transferring the pre-trained knowledge from 2D domain to 3D domain with Point-to-Pixel Prompting in data space and Pixel-to-Point distillation in feature space, exploiting shared knowledge in images and point clouds that display the same visual world. Following the principle of prompting engineering, Point-to-Pixel Prompting transforms point clouds into colorful images with geometry-preserved projection and geometry-aware coloring. Then the pre-trained image models can be directly implemented for point cloud tasks without structural changes or weight modifications. With projection correspondence in feature space, Pixel-to-Point distillation further regards pre-trained image models as the teacher model and distills pre-trained 2D knowledge to student point cloud models, remarkably enhancing inference efficiency and model capacity for point cloud analysis. We conduct extensive experiments in both object classification and scene segmentation under various settings to demonstrate the superiority of our method. In object classification, we reveal the important scale-up trend of Point-to-Pixel Prompting and attain 90.3% accuracy on ScanObjectNN dataset, surpassing previous literature by a large margin. In scene-level semantic segmentation, our method outperforms traditional 3D analysis approaches and shows competitive capacity in dense prediction tasks.
引用
收藏
页码:4381 / 4397
页数:17
相关论文
共 50 条
  • [31] Pre-trained Diffusion Models for Plug-and-Play Medical Image Enhancement
    Ma, Jun
    Zhu, Yuanzhi
    You, Chenyu
    Wang, Bo
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT III, 2023, 14222 : 3 - 13
  • [32] From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader
    Xu, Weiwen
    Li, Xin
    Zhang, Wenxuan
    Zhou, Meng
    Lam, Wai
    Si, Luo
    Bing, Lidong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [33] Prompting disentangled embeddings for knowledge graph completion with pre-trained language model
    Geng, Yuxia
    Chen, Jiaoyan
    Zeng, Yuhang
    Chen, Zhuo
    Zhang, Wen
    Pan, Jeff Z.
    Wang, Yuxiang
    Xu, Xiaoliang
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 268
  • [34] Annotating Columns with Pre-trained Language Models
    Suhara, Yoshihiko
    Li, Jinfeng
    Li, Yuliang
    Zhang, Dan
    Demiralp, Cagatay
    Chen, Chen
    Tan, Wang-Chiew
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1493 - 1503
  • [35] Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models
    Tang, Yiwen
    Zhang, Ray
    Guo, Zoey
    Ma, Xianzheng
    Zhao, Bin
    Wang, Zhigang
    Wang, Dong
    Li, Xuelong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5171 - 5179
  • [36] Point Cloud Pre-training with Diffusion Models
    Zheng, Xiao
    Huang, Xiaoshui
    Mei, Guofeng
    Hou, Yuenan
    Lyu, Zhaoyang
    Dai, Bo
    Ouyang, Wanli
    Gong, Yongshun
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 22935 - 22945
  • [37] Image Priors Assisted Pre-training for Point Cloud Shape Analysis
    Li, Zhengyu
    Wu, Yao
    Qu, Yanyun
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 133 - 145
  • [38] Lottery Jackpots Exist in Pre-Trained Models
    Zhang, Yuxin
    Lin, Mingbao
    Zhong, Yunshan
    Chao, Fei
    Ji, Rongrong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 14990 - 15004
  • [39] Interpreting Art by Leveraging Pre-Trained Models
    Penzel, Niklas
    Denzler, Joachim
    2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
  • [40] LaoPLM: Pre-trained Language Models for Lao
    Lin, Nankai
    Fu, Yingwen
    Yang, Ziyu
    Chen, Chuwei
    Jiang, Shengyi
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6506 - 6512