Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation

被引:6
|
作者
Wang, Yuanbin [1 ]
Huang, Shaofei [2 ]
Gao, Yulu [1 ]
Wang, Zhen [3 ]
Wang, Rui [3 ]
Sheng, Kehua [3 ]
Zhang, Bo [3 ]
Liu, Si [1 ]
机构
[1] Beihang Univ, Inst Artificial Intelligence, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Chinese Acad Sci, Inst Informat Engn, Sch Cyber Secur, Beijing, Peoples R China
[3] Didi Chuxing, Beijing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Point Cloud Segmentation; Semantic Segmentation; Zero-Shot Learning; Cross-Modal Distillation;
D O I
10.1145/3581783.3612107
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional 3D segmentation methods can only recognize a fixed range of classes that appear in the training set, which limits their application in real-world scenarios due to the lack of generalization ability. Large-scale visual-language pre-trained models, such as CLIP, have shown their generalization ability in the zero-shot 2D vision tasks, but are still unable to be applied to 3D semantic segmentation directly. In this work, we focus on zero-shot point cloud semantic segmentation and propose a simple yet effective baseline to transfer the visual-linguistic knowledge implied in CLIP to point cloud encoder at both feature and output levels. Both feature-level and output-level alignments are conducted between 2D and 3D encoders for effective knowledge transfer. Concretely, a Multi-granularity Cross-modal Feature Alignment (MCFA) module is proposed to align 2D and 3D features from global semantic and local position perspectives for feature-level alignment. For the output level, per-pixel pseudo labels of unseen classes are extracted using the pre-trained CLIP model as supervision for the 3D segmentation model to mimic the behavior of the CLIP image encoder. Extensive experiments are conducted on two popular benchmarks of point cloud segmentation. Our method outperforms significantly previous state-of-the-art methods under zero-shot setting (+29.2% mIoU on SemanticKITTI and 31.8% mIoU on nuScenes), and further achieves promising results in the annotation-free point cloud semantic segmentation setting, showing its great potential for label-efficient learning.
引用
收藏
页码:3745 / 3754
页数:10
相关论文
共 50 条
  • [21] Multiprototype Relational Network for Few-Shot ALS Point Cloud Semantic Segmentation by Transferring Knowledge From Photogrammetric Point Clouds
    Dai, Mofan
    Xing, Shuai
    Xu, Qing
    Li, Pengcheng
    Pan, Jiechen
    Zhang, Guoping
    Wang, Hanyun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 17
  • [22] Learning Mask-aware CLIP Representations for Zero-Shot Segmentation
    Jiao, Siyu
    Wei, Yunchao
    Wang, Yaowei
    Zhao, Yao
    Shi, Humphrey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] Expanding Semantic Knowledge for Zero-Shot Graph Embedding
    Wang, Zheng
    Shao, Ruihang
    Wang, Changping
    Hu, Changjun
    Wang, Chaokun
    Gong, Zhiguo
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT I, 2021, 12681 : 394 - 402
  • [24] Feature Enhanced Projection Network for Zero-shot Semantic Segmentation
    Lu, Hongchao
    Fang, Longwei
    Lin, Matthieu
    Deng, Zhidong
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 14011 - 14017
  • [25] Weakly supervised classification model for zero-shot semantic segmentation
    Shen, Fengli
    Wang, Zong-Hui
    Lu, Zhe-Ming
    ELECTRONICS LETTERS, 2020, 56 (23) : 1247 - 1249
  • [26] Zero-shot domain adaptation with enhanced consistency for semantic segmentation
    Yang, Jiming
    Da, Feipeng
    Hong, Ru
    Cai, Zeyu
    Gai, Shaoyan
    COMPUTERS & ELECTRICAL ENGINEERING, 2025, 123
  • [27] Advancing zero-shot semantic segmentation through attribute correlations
    Zhang, Runtong
    Meng, Fanman
    Chen, Shuai
    Wu, Qingbo
    Xu, Linfeng
    Li, Hongliang
    NEUROCOMPUTING, 2024, 594
  • [28] TagCLIP: Improving Discrimination Ability of Zero-Shot Semantic Segmentation
    Li, Jingyao
    Chen, Pengguang
    Qian, Shengju
    Liu, Shu
    Jia, Jiaya
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 11287 - 11297
  • [29] Online Zero-Shot Classification with CLIP
    Qian, Qi
    Hu, Juhua
    COMPUTER VISION - ECCV 2024, PT LXXVII, 2024, 15135 : 462 - 477
  • [30] Delving into Shape-aware Zero-shot Semantic Segmentation
    Liu, Xinyu
    Tian, Beiwen
    Wang, Zhen
    Wang, Rui
    Sheng, Kehua
    Zhang, Bo
    Zhao, Hao
    Zhou, Guyue
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2999 - 3009