Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation

被引：6

作者：

Wang, Yuanbin ^{[1
]}

Huang, Shaofei ^{[2
]}

Gao, Yulu ^{[1
]}

Wang, Zhen ^{[3
]}

Wang, Rui ^{[3
]}

Sheng, Kehua ^{[3
]}

Zhang, Bo ^{[3
]}

Liu, Si ^{[1
]}

机构：

[1] Beihang Univ, Inst Artificial Intelligence, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Chinese Acad Sci, Inst Informat Engn, Sch Cyber Secur, Beijing, Peoples R China

[3] Didi Chuxing, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Point Cloud Segmentation; Semantic Segmentation; Zero-Shot Learning; Cross-Modal Distillation;

D O I：

10.1145/3581783.3612107

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Traditional 3D segmentation methods can only recognize a fixed range of classes that appear in the training set, which limits their application in real-world scenarios due to the lack of generalization ability. Large-scale visual-language pre-trained models, such as CLIP, have shown their generalization ability in the zero-shot 2D vision tasks, but are still unable to be applied to 3D semantic segmentation directly. In this work, we focus on zero-shot point cloud semantic segmentation and propose a simple yet effective baseline to transfer the visual-linguistic knowledge implied in CLIP to point cloud encoder at both feature and output levels. Both feature-level and output-level alignments are conducted between 2D and 3D encoders for effective knowledge transfer. Concretely, a Multi-granularity Cross-modal Feature Alignment (MCFA) module is proposed to align 2D and 3D features from global semantic and local position perspectives for feature-level alignment. For the output level, per-pixel pseudo labels of unseen classes are extracted using the pre-trained CLIP model as supervision for the 3D segmentation model to mimic the behavior of the CLIP image encoder. Extensive experiments are conducted on two popular benchmarks of point cloud segmentation. Our method outperforms significantly previous state-of-the-art methods under zero-shot setting (+29.2% mIoU on SemanticKITTI and 31.8% mIoU on nuScenes), and further achieves promising results in the annotation-free point cloud semantic segmentation setting, showing its great potential for label-efficient learning.

引用

页码：3745 / 3754

页数：10

共 50 条

[41] Context-aware Feature Generation for Zero-shot Semantic Segmentation
Gu, Zhangxuan
Zhou, Siyuan
Niu, Li
Zhao, Zihan
Zhang, Liqing
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1921 - 1929
[42] Episode-based Training Strategy for Zero-Shot Semantic Segmentation
Xiong, Bo
Liu, Jianming
Jing, Zhuoxun
FOURTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING, ICGIP 2022, 2022, 12705
[43] Prototypical Matching and Open Set Rejection for Zero-Shot Semantic Segmentation
Zhang, Hui
Ding, Henghui
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6954 - 6963
[44] Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation
Baek, Donghyeon
Oh, Youngmin
Ham, Bumsub
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9516 - 9525
[45] MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis
Zhong, Ziming
Xu, Yanyu
Li, Jing
Xu, Jiale
Li, Zhengxin
Yu, Chaohui
Gao, Shenghua
COMPUTER VISION - ECCV 2024, PT LXXVII, 2024, 15135 : 182 - 199
[46] SATR: Zero-Shot Semantic Segmentation of 3D Shapes
Abdelreheem, Ahmed
Skorokhodov, Ivan
Ovsjanikov, Maks
Wonka, Peter
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15120 - 15133
[47] Transferring Knowledge From Text to Video: Zero-Shot Anticipation for Procedural Actions
Sener, Fadime
Saraf, Rishabh
Yao, Angela
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7836 - 7852
[48] Visual Semantic Segmentation Based on Few/Zero-Shot Learning: An Overview
Ren, Wenqi
Tang, Yang
Sun, Qiyu
Zhao, Chaoqiang
Han, Qing-Long
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (05) : 1106 - 1126
[49] Human-Guided Zero-Shot Surface Defect Semantic Segmentation
Jin, Yuxin
Zhang, Yunzhou
Shan, Dexing
Wu, Zhifei
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
[50] CGViT: Cross-image GroupViT for zero-shot semantic segmentation
Jiang, Jie
He, Xingjian
Zhu, Xinxin
Wang, Weining
Liu, Jing
PATTERN RECOGNITION, 2025, 164

← 1 2 3 4 5 →