Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

被引:0
|
作者
Shao, Tong [1 ]
Tian, Zhuotao [1 ]
Zhao, Hang [1 ]
Su, Jingyong [1 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
来源
COMPUTER VISION - ECCV 2024, PT LXXXVI | 2025年 / 15144卷
基金
中国国家自然科学基金;
关键词
CLIP; Training-free; Semantic Segmentation;
D O I
10.1007/978-3-031-73016-0_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level alignment training, which affects its performance in tasks requiring detailed local context. Our study delves into the impact of CLIP's [CLS] token on patch feature correlations, revealing a dominance of "global" patches that hinders local feature discrimination. To overcome this, we propose CLIPtrase, a novel training-free semantic segmentation strategy that enhances local feature awareness through recalibrated self-correlation among patches. This approach demonstrates notable improvements in segmentation accuracy and the ability to maintain semantic coherence across objects. Experiments show that we are 22.3% ahead of CLIP on average on 9 segmentation benchmarks, outperforming existing state-of-the-art training-free methods. The code are made publicly available at https://github.com/leaves162/CLIPtrase.
引用
收藏
页码:139 / 156
页数:18
相关论文
共 50 条
  • [41] Source-Free Open Compound Domain Adaptation in Semantic Segmentation
    Zhao, Yuyang
    Zhong, Zhun
    Luo, Zhiming
    Lee, Gim Hee
    Sebe, Nicu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 7019 - 7032
  • [42] A Real-Time Training-Free Laughter Detection System Based on Novel Syllable Segmentation and Correlation Methods
    Chou, Chih-Hung
    Li, Chih-Hung
    Chen, Bo-Wei
    Wang, Jhing-Fa
    Lin, Po-Chuan
    4TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST 2012), 2012, : 294 - 297
  • [43] From ViT Features to Training-free Video Object Segmentation via Streaming-data Mixture Models
    Uziel, Roy
    Dinari, Or
    Freifeld, Oren
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [44] Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
    Zhu, Xiaoyu
    Zhou, Hao
    Xing, Pengfei
    Zhao, Long
    Xu, Hao
    Liang, Junwei
    Hauptmann, Alexander
    Liu, Ting
    Gallagher, Andrew
    COMPUTER VISION - ECCV 2024, PT XXIX, 2025, 15087 : 357 - 375
  • [45] LANGUAGE-DRIVEN OPEN-VOCABULARY 3D SEMANTIC SEGMENTATION WITH KNOWLEDGE DISTILLATION
    Wu, Yuting
    Han, Xian-Feng
    Xiao, Guoqiang
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3320 - 3324
  • [46] MVP-SEG: Multi-view Prompt Learning for Open-Vocabulary Semantic Segmentation
    Guo, Jie
    Wang, Qimeng
    Gao, Yan
    Jiang, Xiaolong
    Lin, Shaohui
    Zhang, Baochang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XII, 2024, 14436 : 158 - 171
  • [47] Generative AI-aided Joint Training-free Secure Semantic Communications via Multi-modal Prompts
    Du, Hongyang
    Liu, Guangyuan
    Niyato, Dusit
    Zhang, Jiayi
    Kang, Jiawen
    Xiong, Zehui
    Ai, Bo
    Kim, Dong In
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, : 12896 - 12900
  • [48] OSAM-Fundus: A training-free, one-shot segmentation framework for optic disc and cup in fundus images
    Wang, Rui
    Yang, Zhouwang
    Song, Yanzhi
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 100
  • [49] LMC: Large Model Collaboration with Cross-assessment for Training-Free Open-Set Object Recognition
    Qu, Haoxuan
    Hui, Xiaofei
    Cai, Yujun
    Liu, Jun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] Expanding Open-Vocabulary Understanding for UAV Aerial Imagery: A Vision-Language Framework to Semantic Segmentation
    Huang, Bangju
    Li, Junhui
    Luan, Wuyang
    Tan, Jintao
    Li, Chenglong
    Huang, Longyang
    DRONES, 2025, 9 (02)