Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

被引:0
|
作者
Shao, Tong [1 ]
Tian, Zhuotao [1 ]
Zhao, Hang [1 ]
Su, Jingyong [1 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
来源
COMPUTER VISION - ECCV 2024, PT LXXXVI | 2025年 / 15144卷
基金
中国国家自然科学基金;
关键词
CLIP; Training-free; Semantic Segmentation;
D O I
10.1007/978-3-031-73016-0_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level alignment training, which affects its performance in tasks requiring detailed local context. Our study delves into the impact of CLIP's [CLS] token on patch feature correlations, revealing a dominance of "global" patches that hinders local feature discrimination. To overcome this, we propose CLIPtrase, a novel training-free semantic segmentation strategy that enhances local feature awareness through recalibrated self-correlation among patches. This approach demonstrates notable improvements in segmentation accuracy and the ability to maintain semantic coherence across objects. Experiments show that we are 22.3% ahead of CLIP on average on 9 segmentation benchmarks, outperforming existing state-of-the-art training-free methods. The code are made publicly available at https://github.com/leaves162/CLIPtrase.
引用
收藏
页码:139 / 156
页数:18
相关论文
共 50 条
  • [21] Exploration of an Open Vocabulary Model on Semantic Segmentation for Street Scene Imagery
    Zeng, Zichao
    Boehm, Jan
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (05)
  • [22] Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation
    Zhang, Fei
    Zhou, Tianfei
    Li, Boyang
    He, Hao
    Ma, Chaofan
    Zhang, Tianjiao
    Yao, Jiangchao
    Zhang, Ya
    Wang, Yanfeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
    Kang, Dahyun
    Cho, Minsu
    COMPUTER VISION - ECCV 2024, PT XLI, 2025, 15099 : 143 - 164
  • [24] LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation
    Shi, Hengcan
    Dao, Son Duy
    Cai, Jianfei
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 742 - 759
  • [25] Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
    Yu, Qihang
    He, Ju
    Deng, Xueqing
    Shen, Xiaohui
    Chen, Liang-Chieh
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [26] TtfDiffusion: Training-free and text-free image editing in diffusion models with structural and semantic disentanglement
    Yu, Zhenbo
    Jin, Jian
    Zhao, Jinhan
    Fu, Zhenyong
    Yang, Jian
    NEUROCOMPUTING, 2025, 619
  • [27] SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
    Xie, Bin
    Cao, Jiale
    Xie, Jin
    Khan, Fahad Shahbaz
    Pang, Yanwei
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 3426 - 3436
  • [28] Large multimodal model for open vocabulary semantic segmentation of remote sensing images
    Liu, Bing
    Chen, Xiaohui
    Yu, Anzhu
    Feng, Fan
    Yue, Jiaying
    Yu, Xuchu
    EUROPEAN JOURNAL OF REMOTE SENSING, 2025, 58 (01)
  • [29] Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network
    Han, Cong
    Zhong, Yujie
    Li, Dengjie
    Han, Kai
    Ma, Lin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1086 - 1096
  • [30] Class Enhancement Losses With Pseudo Labels for Open-Vocabulary Semantic Segmentation
    Dao, Son Duy
    Shi, Hengcan
    Phung, Dinh
    Cai, Jianfei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8442 - 8453