Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

被引：0

作者：

Shao, Tong ^{[1
]}

Tian, Zhuotao ^{[1
]}

Zhao, Hang ^{[1
]}

Su, Jingyong ^{[1
]}

机构：

[1] Harbin Inst Technol, Shenzhen, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT LXXXVI | 2025年 / 15144卷

基金：

中国国家自然科学基金;

关键词：

CLIP; Training-free; Semantic Segmentation;

D O I：

10.1007/978-3-031-73016-0_9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level alignment training, which affects its performance in tasks requiring detailed local context. Our study delves into the impact of CLIP's [CLS] token on patch feature correlations, revealing a dominance of "global" patches that hinders local feature discrimination. To overcome this, we propose CLIPtrase, a novel training-free semantic segmentation strategy that enhances local feature awareness through recalibrated self-correlation among patches. This approach demonstrates notable improvements in segmentation accuracy and the ability to maintain semantic coherence across objects. Experiments show that we are 22.3% ahead of CLIP on average on 9 segmentation benchmarks, outperforming existing state-of-the-art training-free methods. The code are made publicly available at https://github.com/leaves162/CLIPtrase.

引用

页码：139 / 156

页数：18

共 50 条

[21] Exploration of an Open Vocabulary Model on Semantic Segmentation for Street Scene Imagery
Zeng, Zichao
Boehm, Jan
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (05)
[22] Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation
Zhang, Fei
Zhou, Tianfei
Li, Boyang
He, Hao
Ma, Chaofan
Zhang, Tianjiao
Yao, Jiangchao
Zhang, Ya
Wang, Yanfeng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[23] In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
Kang, Dahyun
Cho, Minsu
COMPUTER VISION - ECCV 2024, PT XLI, 2025, 15099 : 143 - 164
[24] LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation
Shi, Hengcan
Dao, Son Duy
Cai, Jianfei
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 742 - 759
[25] Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Yu, Qihang
He, Ju
Deng, Xueqing
Shen, Xiaohui
Chen, Liang-Chieh
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[26] TtfDiffusion: Training-free and text-free image editing in diffusion models with structural and semantic disentanglement
Yu, Zhenbo
Jin, Jian
Zhao, Jinhan
Fu, Zhenyong
Yang, Jian
NEUROCOMPUTING, 2025, 619
[27] SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Xie, Bin
Cao, Jiale
Xie, Jin
Khan, Fahad Shahbaz
Pang, Yanwei
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 3426 - 3436
[28] Large multimodal model for open vocabulary semantic segmentation of remote sensing images
Liu, Bing
Chen, Xiaohui
Yu, Anzhu
Feng, Fan
Yue, Jiaying
Yu, Xuchu
EUROPEAN JOURNAL OF REMOTE SENSING, 2025, 58 (01)
[29] Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network
Han, Cong
Zhong, Yujie
Li, Dengjie
Han, Kai
Ma, Lin
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1086 - 1096
[30] Class Enhancement Losses With Pseudo Labels for Open-Vocabulary Semantic Segmentation
Dao, Son Duy
Shi, Hengcan
Phung, Dinh
Cai, Jianfei
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8442 - 8453

← 1 2 3 4 5 →