Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

被引:0
|
作者
Shao, Tong [1 ]
Tian, Zhuotao [1 ]
Zhao, Hang [1 ]
Su, Jingyong [1 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
来源
COMPUTER VISION - ECCV 2024, PT LXXXVI | 2025年 / 15144卷
基金
中国国家自然科学基金;
关键词
CLIP; Training-free; Semantic Segmentation;
D O I
10.1007/978-3-031-73016-0_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level alignment training, which affects its performance in tasks requiring detailed local context. Our study delves into the impact of CLIP's [CLS] token on patch feature correlations, revealing a dominance of "global" patches that hinders local feature discrimination. To overcome this, we propose CLIPtrase, a novel training-free semantic segmentation strategy that enhances local feature awareness through recalibrated self-correlation among patches. This approach demonstrates notable improvements in segmentation accuracy and the ability to maintain semantic coherence across objects. Experiments show that we are 22.3% ahead of CLIP on average on 9 segmentation benchmarks, outperforming existing state-of-the-art training-free methods. The code are made publicly available at https://github.com/leaves162/CLIPtrase.
引用
收藏
页码:139 / 156
页数:18
相关论文
共 50 条
  • [31] CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
    Cho, Seokju
    Shin, Hoeseong
    Hong, Sunghwan
    Arnab, Anurag
    Seo, Paul Hongsuck
    Kim, Seungryong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 4113 - 4123
  • [32] Proto-Adapter: Efficient Training-Free CLIP-Adapter for Few-Shot Image Classification
    Kato, Naoki
    Nota, Yoshiki
    Aoki, Yoshimitsu
    SENSORS, 2024, 24 (11)
  • [33] A training-free framework for valid object counting by cascading spatial and semantic understanding of foundation models
    Huang, Qinghong
    Zhang, Yifan
    Zhang, Wenbo
    Lin, Jianfeng
    Huang, Binqiang
    Zhang, Jinlu
    Yu, Wenhao
    Information Sciences, 2025, 712
  • [34] Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision
    Xu, Jilan
    Hou, Junlin
    Zhang, Yuejie
    Feng, Rui
    Wang, Yi
    Qiao, Yu
    Xie, Weidi
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2935 - 2944
  • [35] SnapSeg: Training-Free Few-Shot Medical Image Segmentation with Segment Anything Model
    Yu, Nanxi
    Cai, Zhiyuan
    Huang, Yijin
    Tang, Xiaoying
    TRUSTWORTHY ARTIFICIAL INTELLIGENCE FOR HEALTHCARE, TAI4H 2024, 2024, 14812 : 109 - 122
  • [36] Evolving Into a Transformer: From a Training-Free Retrieval-Based Method for Anomaly Obstacle Segmentation
    Fu, Yongjian
    Gao, Dingli
    Liu, Ting
    Zheng, Hang
    Hao, Dayang
    Pan, Zhijie
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 6195 - 6209
  • [37] Training-Free Condition Video Diffusion Models for Single Frame Spatial-Semantic Echocardiogram Synthesis
    Van Phi Nguyen
    Tri Nhan Luong Ha
    Huy Hieu Pham
    Quoc Long Tran
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VI, 2024, 15006 : 670 - 680
  • [38] Development of Training-Free Auto-Segmentation Network for Prostate MR-Guided Radiation Therapy
    Yoon, Y. H.
    Lee, C. W.
    Choi, A.
    Kim, J. S.
    Kim, J.
    Kim, J. W.
    MEDICAL PHYSICS, 2024, 51 (09) : 6639 - 6639
  • [39] Training-free Moving Object Detection System based on Hierarchical Color-guided Motion Segmentation
    Bao, Xinfeng
    Dubbelman, Gijs
    Zinger, Svitlana
    de With, Peter H. N.
    2015 14TH IAPR INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), 2015, : 154 - 157
  • [40] Purify Then Guide: A Bi-Directional Bridge Network for Open-Vocabulary Semantic Segmentation
    Pan, Yuwen
    Sun, Rui
    Wang, Yuan
    Yang, Wenfei
    Zhang, Tianzhu
    Zhang, Yongdong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 343 - 356