Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

被引：0

作者：

Shao, Tong ^{[1
]}

Tian, Zhuotao ^{[1
]}

Zhao, Hang ^{[1
]}

Su, Jingyong ^{[1
]}

机构：

[1] Harbin Inst Technol, Shenzhen, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT LXXXVI | 2025年 / 15144卷

基金：

中国国家自然科学基金;

关键词：

CLIP; Training-free; Semantic Segmentation;

D O I：

10.1007/978-3-031-73016-0_9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level alignment training, which affects its performance in tasks requiring detailed local context. Our study delves into the impact of CLIP's [CLS] token on patch feature correlations, revealing a dominance of "global" patches that hinders local feature discrimination. To overcome this, we propose CLIPtrase, a novel training-free semantic segmentation strategy that enhances local feature awareness through recalibrated self-correlation among patches. This approach demonstrates notable improvements in segmentation accuracy and the ability to maintain semantic coherence across objects. Experiments show that we are 22.3% ahead of CLIP on average on 9 segmentation benchmarks, outperforming existing state-of-the-art training-free methods. The code are made publicly available at https://github.com/leaves162/CLIPtrase.

引用

页码：139 / 156

页数：18

共 50 条

[31] CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Cho, Seokju
Shin, Hoeseong
Hong, Sunghwan
Arnab, Anurag
Seo, Paul Hongsuck
Kim, Seungryong
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 4113 - 4123
[32] Proto-Adapter: Efficient Training-Free CLIP-Adapter for Few-Shot Image Classification
Kato, Naoki
Nota, Yoshiki
Aoki, Yoshimitsu
SENSORS, 2024, 24 (11)
[33] A training-free framework for valid object counting by cascading spatial and semantic understanding of foundation models
Huang, Qinghong
Zhang, Yifan
Zhang, Wenbo
Lin, Jianfeng
Huang, Binqiang
Zhang, Jinlu
Yu, Wenhao
Information Sciences, 2025, 712
[34] Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision
Xu, Jilan
Hou, Junlin
Zhang, Yuejie
Feng, Rui
Wang, Yi
Qiao, Yu
Xie, Weidi
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2935 - 2944
[35] SnapSeg: Training-Free Few-Shot Medical Image Segmentation with Segment Anything Model
Yu, Nanxi
Cai, Zhiyuan
Huang, Yijin
Tang, Xiaoying
TRUSTWORTHY ARTIFICIAL INTELLIGENCE FOR HEALTHCARE, TAI4H 2024, 2024, 14812 : 109 - 122
[36] Evolving Into a Transformer: From a Training-Free Retrieval-Based Method for Anomaly Obstacle Segmentation
Fu, Yongjian
Gao, Dingli
Liu, Ting
Zheng, Hang
Hao, Dayang
Pan, Zhijie
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 6195 - 6209
[37] Training-Free Condition Video Diffusion Models for Single Frame Spatial-Semantic Echocardiogram Synthesis
Van Phi Nguyen
Tri Nhan Luong Ha
Huy Hieu Pham
Quoc Long Tran
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VI, 2024, 15006 : 670 - 680
[38] Development of Training-Free Auto-Segmentation Network for Prostate MR-Guided Radiation Therapy
Yoon, Y. H.
Lee, C. W.
Choi, A.
Kim, J. S.
Kim, J.
Kim, J. W.
MEDICAL PHYSICS, 2024, 51 (09) : 6639 - 6639
[39] Training-free Moving Object Detection System based on Hierarchical Color-guided Motion Segmentation
Bao, Xinfeng
Dubbelman, Gijs
Zinger, Svitlana
de With, Peter H. N.
2015 14TH IAPR INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), 2015, : 154 - 157
[40] Purify Then Guide: A Bi-Directional Bridge Network for Open-Vocabulary Semantic Segmentation
Pan, Yuwen
Sun, Rui
Wang, Yuan
Yang, Wenfei
Zhang, Tianzhu
Zhang, Yongdong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 343 - 356

← 1 2 3 4 5 →