Point-to-Pixel Prompting for Point Cloud Analysis With Pre-Trained Image Models

被引:4
|
作者
Wang, Ziyi [1 ]
Rao, Yongming [1 ]
Yu, Xumin [1 ]
Zhou, Jie [1 ]
Lu, Jiwen [1 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Point cloud compression; Three-dimensional displays; Task analysis; Solid modeling; Tuning; Analytical models; Feature extraction; Distillation; point cloud; prompt tuning;
D O I
10.1109/TPAMI.2024.3354961
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, pre-training big models on large-scale datasets has achieved great success and dominated many downstream tasks in natural language processing and 2D vision, while pre-training in 3D vision is still under development. In this paper, we provide a new perspective of transferring the pre-trained knowledge from 2D domain to 3D domain with Point-to-Pixel Prompting in data space and Pixel-to-Point distillation in feature space, exploiting shared knowledge in images and point clouds that display the same visual world. Following the principle of prompting engineering, Point-to-Pixel Prompting transforms point clouds into colorful images with geometry-preserved projection and geometry-aware coloring. Then the pre-trained image models can be directly implemented for point cloud tasks without structural changes or weight modifications. With projection correspondence in feature space, Pixel-to-Point distillation further regards pre-trained image models as the teacher model and distills pre-trained 2D knowledge to student point cloud models, remarkably enhancing inference efficiency and model capacity for point cloud analysis. We conduct extensive experiments in both object classification and scene segmentation under various settings to demonstrate the superiority of our method. In object classification, we reveal the important scale-up trend of Point-to-Pixel Prompting and attain 90.3% accuracy on ScanObjectNN dataset, surpassing previous literature by a large margin. In scene-level semantic segmentation, our method outperforms traditional 3D analysis approaches and shows competitive capacity in dense prediction tasks.
引用
收藏
页码:4381 / 4397
页数:17
相关论文
共 50 条
  • [21] Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders
    Zhang, Renrui
    Wang, Liuhui
    Qiao, Yu
    Gao, Peng
    Li, Hongsheng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21769 - 21780
  • [22] DVPT: Dynamic Visual Prompt Tuning of large pre-trained models for medical image analysis
    He, Along
    Wu, Yanlin
    Wang, Zhihong
    Li, Tao
    Fu, Huazhu
    NEURAL NETWORKS, 2025, 185
  • [23] Malware image classification: comparative analysis of a fine-tuned CNN and pre-trained models
    Majhi S.K.
    Panda A.
    Srichandan S.K.
    Desai U.
    Acharya B.
    International Journal of Computers and Applications, 2023, 45 (11) : 709 - 721
  • [24] Vulnerability Analysis of Continuous Prompts for Pre-trained Language Models
    Li, Zhicheng
    Shi, Yundi
    Sheng, Xuan
    Yin, Changchun
    Zhou, Lu
    Li, Piji
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IX, 2023, 14262 : 508 - 519
  • [25] EFFICIENT TEXT ANALYSIS WITH PRE-TRAINED NEURAL NETWORK MODELS
    Cui, Jia
    Lu, Heng
    Wang, Wenjie
    Kang, Shiyin
    He, Liqiang
    Li, Guangzhi
    Yu, Dong
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 671 - 676
  • [26] An Application of pre-Trained CNN for Image Classification
    Abdullah
    Hasan, Mohammad S.
    2017 20TH INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2017,
  • [27] Quality-aware Pre-trained Models for Blind Image Quality Assessment
    Zhao, Kai
    Yuan, Kun
    Sun, Ming
    Li, Mading
    Wen, Xing
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22302 - 22313
  • [28] Face Inpainting with Pre-trained Image Transformers
    Gonc, Kaan
    Saglam, Baturay
    Kozat, Suleyman S.
    Dibeklioglu, Hamdi
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [29] ApmNet: Toward Generalizable Visual Continuous Control with Pre-trained Image Models
    Wang, Haitao
    Wu, Hejun
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT III, ECML PKDD 2024, 2024, 14943 : 127 - 142
  • [30] SAR Image Despeckling Using Pre-trained Convolutional Neural Network Models
    Yang, Xiangli
    Denis, Loic
    Tupin, Florence
    Yang, Wen
    2019 JOINT URBAN REMOTE SENSING EVENT (JURSE), 2019,