Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation

被引:6
|
作者
Wang, Yuanbin [1 ]
Huang, Shaofei [2 ]
Gao, Yulu [1 ]
Wang, Zhen [3 ]
Wang, Rui [3 ]
Sheng, Kehua [3 ]
Zhang, Bo [3 ]
Liu, Si [1 ]
机构
[1] Beihang Univ, Inst Artificial Intelligence, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Chinese Acad Sci, Inst Informat Engn, Sch Cyber Secur, Beijing, Peoples R China
[3] Didi Chuxing, Beijing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Point Cloud Segmentation; Semantic Segmentation; Zero-Shot Learning; Cross-Modal Distillation;
D O I
10.1145/3581783.3612107
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional 3D segmentation methods can only recognize a fixed range of classes that appear in the training set, which limits their application in real-world scenarios due to the lack of generalization ability. Large-scale visual-language pre-trained models, such as CLIP, have shown their generalization ability in the zero-shot 2D vision tasks, but are still unable to be applied to 3D semantic segmentation directly. In this work, we focus on zero-shot point cloud semantic segmentation and propose a simple yet effective baseline to transfer the visual-linguistic knowledge implied in CLIP to point cloud encoder at both feature and output levels. Both feature-level and output-level alignments are conducted between 2D and 3D encoders for effective knowledge transfer. Concretely, a Multi-granularity Cross-modal Feature Alignment (MCFA) module is proposed to align 2D and 3D features from global semantic and local position perspectives for feature-level alignment. For the output level, per-pixel pseudo labels of unseen classes are extracted using the pre-trained CLIP model as supervision for the 3D segmentation model to mimic the behavior of the CLIP image encoder. Extensive experiments are conducted on two popular benchmarks of point cloud segmentation. Our method outperforms significantly previous state-of-the-art methods under zero-shot setting (+29.2% mIoU on SemanticKITTI and 31.8% mIoU on nuScenes), and further achieves promising results in the annotation-free point cloud semantic segmentation setting, showing its great potential for label-efficient learning.
引用
收藏
页码:3745 / 3754
页数:10
相关论文
共 50 条
  • [31] AlignZeg: Mitigating Objective Misalignment for Zero-Shot Semantic Segmentation
    Ge, Jiannan
    Xie, Lingxi
    Xie, Hongtao
    Li, Pandeng
    Zhang, Xiaopeng
    Zhang, Yongdong
    Tian, Qi
    COMPUTER VISION-ECCV 2024, PT XLIII, 2025, 15101 : 142 - 161
  • [32] Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation
    Hu, Ping
    Sclaroff, Stan
    Saenko, Kate
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [33] Federated Zero-Shot Industrial Fault Diagnosis With Cloud-Shared Semantic Knowledge Base
    Li, Baoxue
    Zhao, Chunhui
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (13) : 11619 - 11630
  • [34] Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs
    Wang, Xiaolong
    Ye, Yufei
    Gupta, Abhinav
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6857 - 6866
  • [35] Integrating Semantic Knowledge to Tackle Zero-shot Text Classification
    Zhang, Jingqing
    Lertvittayakumjorn, Piyawat
    Guo, Yike
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1031 - 1040
  • [36] Zero-shot Object Prediction using Semantic Scene Knowledge
    Grzeszick, Rene
    Fink, Gernot A.
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISIGRAPP 2017), VOL 5, 2017, : 120 - 129
  • [37] A dynamic semantic knowledge graph for zero-shot object detection
    Wen Lv
    Hongbo Shi
    Shuai Tan
    Bing Song
    Yang Tao
    The Visual Computer, 2023, 39 : 4513 - 4527
  • [38] Transductive semantic knowledge graph propagation for zero-shot learning
    Zhang, Hai-gang
    Que, Hao-yi
    Ren, Jin
    Wu, Zheng-guang
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2023, 360 (17): : 13108 - 13125
  • [39] A dynamic semantic knowledge graph for zero-shot object detection
    Lv, Wen
    Shi, Hongbo
    Tan, Shuai
    Song, Bing
    Tao, Yang
    VISUAL COMPUTER, 2023, 39 (10): : 4513 - 4527
  • [40] Zero-Shot Instance Segmentation
    Zheng, Ye
    Wu, Jiahong
    Qin, Yongqiang
    Zhang, Faen
    Cui, Li
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2593 - 2602