Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation

被引:6
|
作者
Wang, Yuanbin [1 ]
Huang, Shaofei [2 ]
Gao, Yulu [1 ]
Wang, Zhen [3 ]
Wang, Rui [3 ]
Sheng, Kehua [3 ]
Zhang, Bo [3 ]
Liu, Si [1 ]
机构
[1] Beihang Univ, Inst Artificial Intelligence, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Chinese Acad Sci, Inst Informat Engn, Sch Cyber Secur, Beijing, Peoples R China
[3] Didi Chuxing, Beijing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Point Cloud Segmentation; Semantic Segmentation; Zero-Shot Learning; Cross-Modal Distillation;
D O I
10.1145/3581783.3612107
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional 3D segmentation methods can only recognize a fixed range of classes that appear in the training set, which limits their application in real-world scenarios due to the lack of generalization ability. Large-scale visual-language pre-trained models, such as CLIP, have shown their generalization ability in the zero-shot 2D vision tasks, but are still unable to be applied to 3D semantic segmentation directly. In this work, we focus on zero-shot point cloud semantic segmentation and propose a simple yet effective baseline to transfer the visual-linguistic knowledge implied in CLIP to point cloud encoder at both feature and output levels. Both feature-level and output-level alignments are conducted between 2D and 3D encoders for effective knowledge transfer. Concretely, a Multi-granularity Cross-modal Feature Alignment (MCFA) module is proposed to align 2D and 3D features from global semantic and local position perspectives for feature-level alignment. For the output level, per-pixel pseudo labels of unseen classes are extracted using the pre-trained CLIP model as supervision for the 3D segmentation model to mimic the behavior of the CLIP image encoder. Extensive experiments are conducted on two popular benchmarks of point cloud segmentation. Our method outperforms significantly previous state-of-the-art methods under zero-shot setting (+29.2% mIoU on SemanticKITTI and 31.8% mIoU on nuScenes), and further achieves promising results in the annotation-free point cloud semantic segmentation setting, showing its great potential for label-efficient learning.
引用
收藏
页码:3745 / 3754
页数:10
相关论文
共 50 条
  • [41] Context-aware Feature Generation for Zero-shot Semantic Segmentation
    Gu, Zhangxuan
    Zhou, Siyuan
    Niu, Li
    Zhao, Zihan
    Zhang, Liqing
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1921 - 1929
  • [42] Episode-based Training Strategy for Zero-Shot Semantic Segmentation
    Xiong, Bo
    Liu, Jianming
    Jing, Zhuoxun
    FOURTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING, ICGIP 2022, 2022, 12705
  • [43] Prototypical Matching and Open Set Rejection for Zero-Shot Semantic Segmentation
    Zhang, Hui
    Ding, Henghui
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6954 - 6963
  • [44] Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation
    Baek, Donghyeon
    Oh, Youngmin
    Ham, Bumsub
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9516 - 9525
  • [45] MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis
    Zhong, Ziming
    Xu, Yanyu
    Li, Jing
    Xu, Jiale
    Li, Zhengxin
    Yu, Chaohui
    Gao, Shenghua
    COMPUTER VISION - ECCV 2024, PT LXXVII, 2024, 15135 : 182 - 199
  • [46] SATR: Zero-Shot Semantic Segmentation of 3D Shapes
    Abdelreheem, Ahmed
    Skorokhodov, Ivan
    Ovsjanikov, Maks
    Wonka, Peter
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15120 - 15133
  • [47] Transferring Knowledge From Text to Video: Zero-Shot Anticipation for Procedural Actions
    Sener, Fadime
    Saraf, Rishabh
    Yao, Angela
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7836 - 7852
  • [48] Visual Semantic Segmentation Based on Few/Zero-Shot Learning: An Overview
    Ren, Wenqi
    Tang, Yang
    Sun, Qiyu
    Zhao, Chaoqiang
    Han, Qing-Long
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (05) : 1106 - 1126
  • [49] Human-Guided Zero-Shot Surface Defect Semantic Segmentation
    Jin, Yuxin
    Zhang, Yunzhou
    Shan, Dexing
    Wu, Zhifei
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [50] CGViT: Cross-image GroupViT for zero-shot semantic segmentation
    Jiang, Jie
    He, Xingjian
    Zhu, Xinxin
    Wang, Weining
    Liu, Jing
    PATTERN RECOGNITION, 2025, 164