Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

被引:0
|
作者
Shao, Tong [1 ]
Tian, Zhuotao [1 ]
Zhao, Hang [1 ]
Su, Jingyong [1 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
来源
COMPUTER VISION - ECCV 2024, PT LXXXVI | 2025年 / 15144卷
基金
中国国家自然科学基金;
关键词
CLIP; Training-free; Semantic Segmentation;
D O I
10.1007/978-3-031-73016-0_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level alignment training, which affects its performance in tasks requiring detailed local context. Our study delves into the impact of CLIP's [CLS] token on patch feature correlations, revealing a dominance of "global" patches that hinders local feature discrimination. To overcome this, we propose CLIPtrase, a novel training-free semantic segmentation strategy that enhances local feature awareness through recalibrated self-correlation among patches. This approach demonstrates notable improvements in segmentation accuracy and the ability to maintain semantic coherence across objects. Experiments show that we are 22.3% ahead of CLIP on average on 9 segmentation benchmarks, outperforming existing state-of-the-art training-free methods. The code are made publicly available at https://github.com/leaves162/CLIPtrase.
引用
收藏
页码:139 / 156
页数:18
相关论文
共 50 条
  • [1] Diffusion Model is Secretly a Training-Free Open Vocabulary Semantic Segmenter
    Wang, Jinglong
    Li, Xiawei
    Zhang, Jing
    Xu, Qingyuan
    Zhou, Qin
    Yu, Qian
    Sheng, Lu
    Xu, Dong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 1895 - 1907
  • [2] Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
    Barsellotti, Luca
    Amoroso, Roberto
    Cornia, Marcella
    Baraldi, Lorenzo
    Cucchiara, Rita
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 3689 - 3698
  • [3] Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
    Liang, Feng
    Wu, Bichen
    Dai, Xiaoliang
    Li, Kunpeng
    Zhao, Yinan
    Zhang, Hang
    Zhang, Peizhao
    Vajda, Peter
    Marculescu, Diana
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7061 - 7070
  • [4] CLIP-DINOiser: Teaching CLIP a Few DINO Tricks for Open-Vocabulary Semantic Segmentation
    Wysoczanska, Monika
    Simeoni, Oriane
    Ramamonjisoa, Michael
    Bursuc, Andrei
    Trzcinski, Tomasz
    Perez, Patrick
    COMPUTER VISION - ECCV 2024, PT LXI, 2025, 15119 : 320 - 337
  • [5] CLIP-TSA: CLIP-guided open-vocabulary semantic segmentation with two-level semantic awareness
    Liang, Zhixue
    Dong, Wenyong
    Zhang, Bo
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [6] Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only
    Chen, Jun
    Zhu, Deyao
    Qian, Guocheng
    Ghanem, Bernard
    Yan, Zhicheng
    Zhu, Chenchen
    Xiao, Fanyi
    Culatana, Sean Chang
    Elhoseiny, Mohamed
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 699 - 710
  • [7] TAG: Guidance-Free Open-Vocabulary Semantic Segmentation
    Kawano, Yasufumi
    Aoki, Yoshimitsu
    IEEE ACCESS, 2024, 12 : 88322 - 88331
  • [8] PaintSeg: Training-free Segmentation via Painting
    Li, Xiang
    Lin, Chung-Ching
    Chen, Yinpeng
    Liu, Zicheng
    Wang, Jinglu
    Singh, Rita
    Raj, Bhiksha
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation
    Zhu, Wenqi
    Cao, Jiale
    Xie, Jin
    Yang, Shuangming
    Pang, Yanwei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1098 - 1110
  • [10] Towards Training-Free Open-World Segmentation via Image Prompt Foundation Models
    Tang, Lv
    Jiang, Peng-Tao
    Xiao, Haoke
    Li, Bo
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (01) : 1 - 15