Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

被引：0

作者：

Shao, Tong ^{[1
]}

Tian, Zhuotao ^{[1
]}

Zhao, Hang ^{[1
]}

Su, Jingyong ^{[1
]}

机构：

[1] Harbin Inst Technol, Shenzhen, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT LXXXVI | 2025年 / 15144卷

基金：

中国国家自然科学基金;

关键词：

CLIP; Training-free; Semantic Segmentation;

D O I：

10.1007/978-3-031-73016-0_9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level alignment training, which affects its performance in tasks requiring detailed local context. Our study delves into the impact of CLIP's [CLS] token on patch feature correlations, revealing a dominance of "global" patches that hinders local feature discrimination. To overcome this, we propose CLIPtrase, a novel training-free semantic segmentation strategy that enhances local feature awareness through recalibrated self-correlation among patches. This approach demonstrates notable improvements in segmentation accuracy and the ability to maintain semantic coherence across objects. Experiments show that we are 22.3% ahead of CLIP on average on 9 segmentation benchmarks, outperforming existing state-of-the-art training-free methods. The code are made publicly available at https://github.com/leaves162/CLIPtrase.

引用

页码：139 / 156

页数：18

共 50 条

[1] Diffusion Model is Secretly a Training-Free Open Vocabulary Semantic Segmenter
Wang, Jinglong
Li, Xiawei
Zhang, Jing
Xu, Qingyuan
Zhou, Qin
Yu, Qian
Sheng, Lu
Xu, Dong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 1895 - 1907
[2] Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
Barsellotti, Luca
Amoroso, Roberto
Cornia, Marcella
Baraldi, Lorenzo
Cucchiara, Rita
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 3689 - 3698
[3] Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Liang, Feng
Wu, Bichen
Dai, Xiaoliang
Li, Kunpeng
Zhao, Yinan
Zhang, Hang
Zhang, Peizhao
Vajda, Peter
Marculescu, Diana
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7061 - 7070
[4] CLIP-DINOiser: Teaching CLIP a Few DINO Tricks for Open-Vocabulary Semantic Segmentation
Wysoczanska, Monika
Simeoni, Oriane
Ramamonjisoa, Michael
Bursuc, Andrei
Trzcinski, Tomasz
Perez, Patrick
COMPUTER VISION - ECCV 2024, PT LXI, 2025, 15119 : 320 - 337
[5] CLIP-TSA: CLIP-guided open-vocabulary semantic segmentation with two-level semantic awareness
Liang, Zhixue
Dong, Wenyong
Zhang, Bo
MULTIMEDIA SYSTEMS, 2025, 31 (01)
[6] Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only
Chen, Jun
Zhu, Deyao
Qian, Guocheng
Ghanem, Bernard
Yan, Zhicheng
Zhu, Chenchen
Xiao, Fanyi
Culatana, Sean Chang
Elhoseiny, Mohamed
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 699 - 710
[7] TAG: Guidance-Free Open-Vocabulary Semantic Segmentation
Kawano, Yasufumi
Aoki, Yoshimitsu
IEEE ACCESS, 2024, 12 : 88322 - 88331
[8] PaintSeg: Training-free Segmentation via Painting
Li, Xiang
Lin, Chung-Ching
Chen, Yinpeng
Liu, Zicheng
Wang, Jinglu
Singh, Rita
Raj, Bhiksha
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[9] CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation
Zhu, Wenqi
Cao, Jiale
Xie, Jin
Yang, Shuangming
Pang, Yanwei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1098 - 1110
[10] Towards Training-Free Open-World Segmentation via Image Prompt Foundation Models
Tang, Lv
Jiang, Peng-Tao
Xiao, Haoke
Li, Bo
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (01) : 1 - 15

← 1 2 3 4 5 →