Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation

被引:6
|
作者
Wang, Yuanbin [1 ]
Huang, Shaofei [2 ]
Gao, Yulu [1 ]
Wang, Zhen [3 ]
Wang, Rui [3 ]
Sheng, Kehua [3 ]
Zhang, Bo [3 ]
Liu, Si [1 ]
机构
[1] Beihang Univ, Inst Artificial Intelligence, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Chinese Acad Sci, Inst Informat Engn, Sch Cyber Secur, Beijing, Peoples R China
[3] Didi Chuxing, Beijing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Point Cloud Segmentation; Semantic Segmentation; Zero-Shot Learning; Cross-Modal Distillation;
D O I
10.1145/3581783.3612107
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional 3D segmentation methods can only recognize a fixed range of classes that appear in the training set, which limits their application in real-world scenarios due to the lack of generalization ability. Large-scale visual-language pre-trained models, such as CLIP, have shown their generalization ability in the zero-shot 2D vision tasks, but are still unable to be applied to 3D semantic segmentation directly. In this work, we focus on zero-shot point cloud semantic segmentation and propose a simple yet effective baseline to transfer the visual-linguistic knowledge implied in CLIP to point cloud encoder at both feature and output levels. Both feature-level and output-level alignments are conducted between 2D and 3D encoders for effective knowledge transfer. Concretely, a Multi-granularity Cross-modal Feature Alignment (MCFA) module is proposed to align 2D and 3D features from global semantic and local position perspectives for feature-level alignment. For the output level, per-pixel pseudo labels of unseen classes are extracted using the pre-trained CLIP model as supervision for the 3D segmentation model to mimic the behavior of the CLIP image encoder. Extensive experiments are conducted on two popular benchmarks of point cloud segmentation. Our method outperforms significantly previous state-of-the-art methods under zero-shot setting (+29.2% mIoU on SemanticKITTI and 31.8% mIoU on nuScenes), and further achieves promising results in the annotation-free point cloud semantic segmentation setting, showing its great potential for label-efficient learning.
引用
收藏
页码:3745 / 3754
页数:10
相关论文
共 50 条
  • [1] ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation
    Zhou, Ziqin
    Lei, Yinjie
    Zhano, Bowen
    Liu, Lingqiao
    Liu, Yifan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11175 - 11185
  • [2] Zero-Shot Point Cloud Segmentation by Semantic-Visual Aware Synthesis
    Yang, Yuwei
    Hayat, Munawar
    Jin, Zhao
    Zhu, Hongyuan
    Lei, Yinjie
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11552 - 11562
  • [3] Zero-Shot Semantic Segmentation
    Bucher, Maxime
    Vu, Tuan-Hung
    Cord, Matthieu
    Perez, Patrick
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [4] Decoupling Zero-Shot Semantic Segmentation
    Ding, Jian
    Xue, Nan
    Xia, Gui-Song
    Dai, Dengxin
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11573 - 11582
  • [5] Zero-Shot Single-View Point Cloud Reconstruction via Cross-Category Knowledge Transferring
    Lai, Lvlong
    Chen, Jian
    Wu, Qingyao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1448 - 1459
  • [6] Bridging Language and Geometric Primitives for Zero-shot Point Cloud Segmentation
    Chen, Runnan
    Zhu, Xinge
    Chen, Nenglun
    Li, Wei
    Ma, Yuexin
    Yang, Ruigang
    Wang, Wenping
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5380 - 5388
  • [7] Recursive Training for Zero-Shot Semantic Segmentation
    Wang, Ce
    Farazi, Moshiur
    Barnes, Nick
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [8] Zero-Shot Hashing via Transferring Supervised Knowledge
    Yang, Yang
    Luo, Yadan
    Chen, Weilun
    Shen, Fumin
    Shao, Jie
    Shen, Heng Tao
    MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 1286 - 1295
  • [9] Prototype Adaption and Projection for Few- and Zero-Shot 3D Point Cloud Semantic Segmentation
    He, Shuting
    Jiang, Xudong
    Jiang, Wei
    Ding, Henghui
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3199 - 3211
  • [10] ClipSAM: CLIP and SAM collaboration for zero-shot anomaly segmentation
    Li, Shengze
    Cao, Jianjian
    Ye, Peng
    Ding, Yuhan
    Tu, Chongjun
    Chen, Tao
    NEUROCOMPUTING, 2025, 618