ClipSAM: CLIP and SAM collaboration for zero-shot anomaly segmentation

被引：0

作者：

Li, Shengze ^{[1
]}

Cao, Jianjian ^{[1
]}

Ye, Peng ^{[1
]}

Ding, Yuhan ^{[1
]}

Tu, Chongjun ^{[1
]}

Chen, Tao ^{[1
]}

机构：

[1] Fudan Univ, Shanghai 200433, Peoples R China

来源：

NEUROCOMPUTING | 2025年 / 618卷

基金：

中国国家自然科学基金;

关键词：

Zero-shot anomaly segmentation; CLIP and SAM collaboration; Cross-modal interaction; NETWORK;

D O I：

10.1016/j.neucom.2024.129122

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Zero-Shot Anomaly Segmentation (ZSAS) aims to segment anomalies without any training data related to the test samples. Recently, while foundational models like CLIP and SAM have shown potential for ZSAS, existing approaches leveraging either CLIP or SAM individually encounter critical limitations: (1) CLIP emphasizes global feature alignment across different inputs, leading to imprecise segmentation of local anomalous parts; and (2) SAM tends to generate numerous redundant masks without proper prompt constraints, resulting in complex post-processing requirements. In this paper, we introduce ClipSAM, a novel collaborative framework that integrates CLIP and SAM to address these issues in ZSAS. The insight behind ClipSAM is to employ CLIP's semantic understanding capability for anomaly localization and rough segmentation, which is further used as the prompt constraints for SAM to refine the anomaly segmentation results. Specifically, we propose a Unified Multi-scale Cross-modal Interaction (UMCI) module that learns local and global semantics about anomalous parts by interacting language features with visual features at both row-column and multi-scale levels, effectively reasoning about anomaly positions. Additionally, we develop a Multi-level Mask Refinement (MMR) module that guides SAM's output through multi-level spatial prompts derived from CLIP's localization, progressively merging the masks to refine the segmentation results. Extensive experiments validate the effectiveness of our approach, achieving the optimal segmentation performance on the MVTec-AD and VisA datasets.

引用

页数：11

共 50 条

[1] Enhancing Zero-Shot Anomaly Detection: CLIP-SAM Collaboration with Cascaded Prompts
Hou, Yanning
Xu, Ke
Li, Junfa
Ruan, Yanran
Qiu, Jianfeng
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT II, 2025, 15032 : 46 - 60
[2] VCP-CLIP: A Visual Context Prompting Model for Zero-Shot Anomaly Segmentation
Qu, Zhen
Tao, Xian
Prasad, Mukesh
Shen, Fei
Zhang, Zhengtao
Gong, Xinyi
Ding, Guiguang
COMPUTER VISION - ECCV 2024, PT LXIX, 2025, 15127 : 301 - 317
[3] Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero-shot Medical Image Segmentation
Aleem, Sidra
Wang, Fangyijie
Maniparambil, Mayug
Arazo, Eric
Dietlmeier, Julia
Curran, Kathleen
O'Connor, Noel E.
Little, Suzanne
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2024, : 5184 - 5193
[4] ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation
Zhou, Ziqin
Lei, Yinjie
Zhano, Bowen
Liu, Lingqiao
Liu, Yifan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11175 - 11185
[5] Learning Mask-aware CLIP Representations for Zero-Shot Segmentation
Jiao, Siyu
Wei, Yunchao
Wang, Yaowei
Zhao, Yao
Shi, Humphrey
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] Online Zero-Shot Classification with CLIP
Qian, Qi
Hu, Juhua
COMPUTER VISION - ECCV 2024, PT LXXVII, 2024, 15135 : 462 - 477
[7] AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection
Cao, Yunkang
Zhang, Jiangning
Frittoli, Luca
Cheng, Yuqi
Shen, Weiming
Boracchi, Giacomo
COMPUTER VISION-ECCV 2024, PT XXXV, 2025, 15093 : 55 - 72
[8] Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation
Wang, Yuanbin
Huang, Shaofei
Gao, Yulu
Wang, Zhen
Wang, Rui
Sheng, Kehua
Zhang, Bo
Liu, Si
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3745 - 3754
[9] Zero-Shot Segmentation of Eye Features Using the Segment Anything Model (SAM)
Maquiling, Virmarie
Byrne, Sean Anthony
Niehorster, Diederick C.
Nystrom, Marcus
Kasneci, Enkelejda
PROCEEDINGS OF THE ACM ON COMPUTER GRAPHICS AND INTERACTIVE TECHNIQUES, 2024, 7 (02)
[10] Zero-Shot Instance Segmentation
Zheng, Ye
Wu, Jiahong
Qin, Yongqiang
Zhang, Faen
Cui, Li
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2593 - 2602

← 1 2 3 4 5 →