Point-to-Pixel Prompting for Point Cloud Analysis With Pre-Trained Image Models

被引:4
|
作者
Wang, Ziyi [1 ]
Rao, Yongming [1 ]
Yu, Xumin [1 ]
Zhou, Jie [1 ]
Lu, Jiwen [1 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Point cloud compression; Three-dimensional displays; Task analysis; Solid modeling; Tuning; Analytical models; Feature extraction; Distillation; point cloud; prompt tuning;
D O I
10.1109/TPAMI.2024.3354961
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, pre-training big models on large-scale datasets has achieved great success and dominated many downstream tasks in natural language processing and 2D vision, while pre-training in 3D vision is still under development. In this paper, we provide a new perspective of transferring the pre-trained knowledge from 2D domain to 3D domain with Point-to-Pixel Prompting in data space and Pixel-to-Point distillation in feature space, exploiting shared knowledge in images and point clouds that display the same visual world. Following the principle of prompting engineering, Point-to-Pixel Prompting transforms point clouds into colorful images with geometry-preserved projection and geometry-aware coloring. Then the pre-trained image models can be directly implemented for point cloud tasks without structural changes or weight modifications. With projection correspondence in feature space, Pixel-to-Point distillation further regards pre-trained image models as the teacher model and distills pre-trained 2D knowledge to student point cloud models, remarkably enhancing inference efficiency and model capacity for point cloud analysis. We conduct extensive experiments in both object classification and scene segmentation under various settings to demonstrate the superiority of our method. In object classification, we reveal the important scale-up trend of Point-to-Pixel Prompting and attain 90.3% accuracy on ScanObjectNN dataset, surpassing previous literature by a large margin. In scene-level semantic segmentation, our method outperforms traditional 3D analysis approaches and shows competitive capacity in dense prediction tasks.
引用
收藏
页码:4381 / 4397
页数:17
相关论文
共 50 条
  • [41] Pre-trained models: Past, present and future
    Han, Xu
    Zhang, Zhengyan
    Ding, Ning
    Gu, Yuxian
    Liu, Xiao
    Huo, Yuqi
    Qiu, Jiezhong
    Yao, Yuan
    Zhang, Ao
    Zhang, Liang
    Han, Wentao
    Huang, Minlie
    Jin, Qin
    Lan, Yanyan
    Liu, Yang
    Liu, Zhiyuan
    Lu, Zhiwu
    Qiu, Xipeng
    Song, Ruihua
    Tang, Jie
    Wen, Ji-Rong
    Yuan, Jinhui
    Zhao, Wayne Xin
    Zhu, Jun
    AI OPEN, 2021, 2 : 225 - 250
  • [42] Natural Attack for Pre-trained Models of Code
    Yang, Zhou
    Shi, Jieke
    He, Junda
    Lo, David
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 1482 - 1493
  • [43] Generalization of vision pre-trained models for histopathology
    Milad Sikaroudi
    Maryam Hosseini
    Ricardo Gonzalez
    Shahryar Rahnamayan
    H. R. Tizhoosh
    Scientific Reports, 13
  • [44] Knowledge Rumination for Pre-trained Language Models
    Yao, Yunzhi
    Wang, Peng
    Mao, Shengyu
    Tan, Chuanqi
    Huang, Fei
    Chen, Huajun
    Zhang, Ningyu
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3387 - 3404
  • [45] Generalization of vision pre-trained models for histopathology
    Sikaroudi, Milad
    Hosseini, Maryam
    Gonzalez, Ricardo
    Rahnamayan, Shahryar
    Tizhoosh, H. R.
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [46] Deciphering Stereotypes in Pre-Trained Language Models
    Ma, Weicheng
    Scheible, Henry
    Wang, Brian
    Veeramachaneni, Goutham
    Chowdhary, Pratim
    Sung, Alan
    Koulogeorge, Andrew
    Wang, Lili
    Yang, Diyi
    Vosoughi, Soroush
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 11328 - 11345
  • [47] PhoBERT: Pre-trained language models for Vietnamese
    Dat Quoc Nguyen
    Anh Tuan Nguyen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1037 - 1042
  • [48] Learning to Modulate pre-trained Models in RL
    Schmied, Thomas
    Hofmarcher, Markus
    Paischer, Fabian
    Pascanu, Razvan
    Hochreiter, Sepp
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [49] Deep Compression of Pre-trained Transformer Models
    Wang, Naigang
    Liu, Chi-Chun
    Venkataramani, Swagath
    Sen, Sanchari
    Chen, Chia-Yu
    El Maghraoui, Kaoutar
    Srinivasan, Vijayalakshmi
    Chang, Leland
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [50] HinPLMs: Pre-trained Language Models for Hindi
    Huang, Xixuan
    Lin, Nankai
    Li, Kexin
    Wang, Lianxi
    Gan, Suifu
    2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 241 - 246