ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation

被引:63
|
作者
Zhou, Ziqin [1 ]
Lei, Yinjie [2 ]
Zhano, Bowen [1 ]
Liu, Lingqiao [1 ]
Liu, Yifan [1 ]
机构
[1] Univ Adelaide, Adelaide, Australia
[2] Sichuan Univ, Chengdu, Peoples R China
关键词
D O I
10.1109/CVPR52729.2023.01075
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, CLIP has been applied to pixel-level zero-shot learning tasks via a two-stage scheme. The general idea is to first generate class-agnostic region proposals and then feed the cropped proposal regions to CLIP to utilize its image-level zero-shot classification capability. While effective, such a scheme requires two image encoders, one for proposal generation and one for CLIP, leading to a complicated pipeline and high computational cost. In this work, we pursue a simpler-and-efficient one-stage solution that directly extends CLIP's zero-shot prediction capability from image to pixel level. Our investigation starts with a straightforward extension as our baseline that generates semantic masks by comparing the similarity between text and patch embeddings extracted from CLIP. However, such a paradigm could heavily overfit the seen classes and fail to generalize to unseen classes. To handle this issue, we propose three simple-but-effective designs and figure out that they can significantly retain the inherent zero-shot capacity of CLIP and improve pixel-level generalization ability. Incorporating those modifications leads to an efficient zero-shot semantic segmentation system called ZegCLIP. Through extensive experiments on three public benchmarks, ZegCLIP demonstrates superior performance, outperforming the state-of-the-art methods by a large margin under both "inductive" and "transductive" zero-shot settings. In addition, compared with the two-stage method, our one-stage ZegCLIP achieves a speedup of about 5 times faster during inference. We release the code at https: //github.com/ZiqinZhou66/ZegCLIP.git.
引用
收藏
页码:11175 / 11185
页数:11
相关论文
共 50 条
  • [1] Zero-Shot Semantic Segmentation
    Bucher, Maxime
    Vu, Tuan-Hung
    Cord, Matthieu
    Perez, Patrick
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [2] CLIP4HOI: Towards Adapting CLIP for Practical Zero-Shot HOI Detection
    Mao, Yunyao
    Deng, Jiajun
    Zhou, Wengang
    Li, Li
    Fang, Yao
    Li, Houqiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation
    Wang, Yuanbin
    Huang, Shaofei
    Gao, Yulu
    Wang, Zhen
    Wang, Rui
    Sheng, Kehua
    Zhang, Bo
    Liu, Si
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3745 - 3754
  • [4] Decoupling Zero-Shot Semantic Segmentation
    Ding, Jian
    Xue, Nan
    Xia, Gui-Song
    Dai, Dengxin
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11573 - 11582
  • [5] Recursive Training for Zero-Shot Semantic Segmentation
    Wang, Ce
    Farazi, Moshiur
    Barnes, Nick
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [6] ClipSAM: CLIP and SAM collaboration for zero-shot anomaly segmentation
    Li, Shengze
    Cao, Jianjian
    Ye, Peng
    Ding, Yuhan
    Tu, Chongjun
    Chen, Tao
    NEUROCOMPUTING, 2025, 618
  • [7] A meaningful learning method for zero-shot semantic segmentation
    Liu, Xianglong
    Bai, Shihao
    An, Shan
    Wang, Shuo
    Liu, Wei
    Zhao, Xiaowei
    Ma, Yuqing
    SCIENCE CHINA-INFORMATION SCIENCES, 2023, 66 (11)
  • [8] A meaningful learning method for zero-shot semantic segmentation
    Xianglong LIU
    Shihao BAI
    Shan AN
    Shuo WANG
    Wei LIU
    Xiaowei ZHAO
    Yuqing MA
    Science China(Information Sciences), 2023, 66 (11) : 35 - 53
  • [9] A meaningful learning method for zero-shot semantic segmentation
    Xianglong Liu
    Shihao Bai
    Shan An
    Shuo Wang
    Wei Liu
    Xiaowei Zhao
    Yuqing Ma
    Science China Information Sciences, 2023, 66
  • [10] Zero-shot Semantic Segmentation Using Relation Network
    Zhang, Yindong
    Khriyenko, Oleksiy
    PROCEEDINGS OF THE 28TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION FRUCT, 2021, : 516 - 527