Point-to-Pixel Prompting for Point Cloud Analysis With Pre-Trained Image Models

被引:4
|
作者
Wang, Ziyi [1 ]
Rao, Yongming [1 ]
Yu, Xumin [1 ]
Zhou, Jie [1 ]
Lu, Jiwen [1 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Point cloud compression; Three-dimensional displays; Task analysis; Solid modeling; Tuning; Analytical models; Feature extraction; Distillation; point cloud; prompt tuning;
D O I
10.1109/TPAMI.2024.3354961
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, pre-training big models on large-scale datasets has achieved great success and dominated many downstream tasks in natural language processing and 2D vision, while pre-training in 3D vision is still under development. In this paper, we provide a new perspective of transferring the pre-trained knowledge from 2D domain to 3D domain with Point-to-Pixel Prompting in data space and Pixel-to-Point distillation in feature space, exploiting shared knowledge in images and point clouds that display the same visual world. Following the principle of prompting engineering, Point-to-Pixel Prompting transforms point clouds into colorful images with geometry-preserved projection and geometry-aware coloring. Then the pre-trained image models can be directly implemented for point cloud tasks without structural changes or weight modifications. With projection correspondence in feature space, Pixel-to-Point distillation further regards pre-trained image models as the teacher model and distills pre-trained 2D knowledge to student point cloud models, remarkably enhancing inference efficiency and model capacity for point cloud analysis. We conduct extensive experiments in both object classification and scene segmentation under various settings to demonstrate the superiority of our method. In object classification, we reveal the important scale-up trend of Point-to-Pixel Prompting and attain 90.3% accuracy on ScanObjectNN dataset, surpassing previous literature by a large margin. In scene-level semantic segmentation, our method outperforms traditional 3D analysis approaches and shows competitive capacity in dense prediction tasks.
引用
收藏
页码:4381 / 4397
页数:17
相关论文
共 50 条
  • [1] P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting
    Wang, Ziyi
    Yu, Xumin
    Rao, Yongming
    Zhou, Jie
    Lu, Jiwen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models
    Zha, Yaohua
    Wang, Jinpeng
    Dai, Tao
    Bin Chen
    Wang, Zhi
    Xia, Shu-Tao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 14115 - 14124
  • [3] Controllable Generation from Pre-trained Language Models via Inverse Prompting
    Zou, Xu
    Yin, Da
    Zhong, Qingyang
    Yang, Hongxia
    Yang, Zhilin
    Tang, Jie
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2450 - 2460
  • [4] Probing Power by Prompting: Harnessing Pre-trained Language Models for Power Connotation Framing
    Khanehzar, Shima
    Cohn, Trevor
    Mikolajczak, Gosia
    Frermann, Lea
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 873 - 885
  • [5] PPDistiller: Weakly-supervised 3D point cloud semantic segmentation framework via point-to-pixel distillation
    Zhang, Yong
    Wu, Zhaolong
    Lan, Rukai
    Liang, Yingjie
    Liu, Yifan
    KNOWLEDGE-BASED SYSTEMS, 2024, 305
  • [6] Pre-trained CNNs Models for Content based Image Retrieval
    Ahmed, Ali
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (07) : 200 - 206
  • [7] Context Analysis for Pre-trained Masked Language Models
    Lai, Yi-An
    Lalwani, Garima
    Zhang, Yi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 3789 - 3804
  • [8] Exploratory Architectures Analysis of Various Pre-trained Image Classification Models for Deep Learning
    Deepa, S.
    Zeema, J. Loveline
    Gokila, S.
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2024, 15 (01) : 66 - 78
  • [9] MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators
    Tan, Zhixing
    Zhang, Xiangwen
    Wang, Shuo
    Liu, Yang
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6131 - 6142
  • [10] Pre-Trained Image Processing Transformer
    Chen, Hanting
    Wang, Yunhe
    Guo, Tianyu
    Xu, Chang
    Deng, Yiping
    Liu, Zhenhua
    Ma, Siwei
    Xu, Chunjing
    Xu, Chao
    Gao, Wen
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12294 - 12305