ClipSAM: CLIP and SAM collaboration for zero-shot anomaly segmentation

被引：0

作者：

Li, Shengze ^{[1
]}

Cao, Jianjian ^{[1
]}

Ye, Peng ^{[1
]}

Ding, Yuhan ^{[1
]}

Tu, Chongjun ^{[1
]}

Chen, Tao ^{[1
]}

机构：

[1] Fudan Univ, Shanghai 200433, Peoples R China

来源：

NEUROCOMPUTING | 2025年 / 618卷

基金：

中国国家自然科学基金;

关键词：

Zero-shot anomaly segmentation; CLIP and SAM collaboration; Cross-modal interaction; NETWORK;

D O I：

10.1016/j.neucom.2024.129122

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Zero-Shot Anomaly Segmentation (ZSAS) aims to segment anomalies without any training data related to the test samples. Recently, while foundational models like CLIP and SAM have shown potential for ZSAS, existing approaches leveraging either CLIP or SAM individually encounter critical limitations: (1) CLIP emphasizes global feature alignment across different inputs, leading to imprecise segmentation of local anomalous parts; and (2) SAM tends to generate numerous redundant masks without proper prompt constraints, resulting in complex post-processing requirements. In this paper, we introduce ClipSAM, a novel collaborative framework that integrates CLIP and SAM to address these issues in ZSAS. The insight behind ClipSAM is to employ CLIP's semantic understanding capability for anomaly localization and rough segmentation, which is further used as the prompt constraints for SAM to refine the anomaly segmentation results. Specifically, we propose a Unified Multi-scale Cross-modal Interaction (UMCI) module that learns local and global semantics about anomalous parts by interacting language features with visual features at both row-column and multi-scale levels, effectively reasoning about anomaly positions. Additionally, we develop a Multi-level Mask Refinement (MMR) module that guides SAM's output through multi-level spatial prompts derived from CLIP's localization, progressively merging the masks to refine the segmentation results. Extensive experiments validate the effectiveness of our approach, achieving the optimal segmentation performance on the MVTec-AD and VisA datasets.

引用

页数：11

共 50 条

[31] Robust Zero-Shot Crowd Counting and Localization With Adaptive Resolution SAM
Wan, Jia
Wu, Qiangqiang
Lin, Wei
Chan, Antoni
COMPUTER VISION-ECCV 2024, PT LVII, 2025, 15115 : 478 - 495
[32] No More Training: SAM's Zero-Shot Transfer Capabilities for Cost-Efficient Medical Image Segmentation
Gutierrez, Juan D.
Rodriguez-Echeverria, Roberto
Delgado, Emilio
Rodrigo, Miguel Angel Suero
Sanchez-Figueroa, Fernando
IEEE ACCESS, 2024, 12 : 24205 - 24216
[33] CLIP4HOI: Towards Adapting CLIP for Practical Zero-Shot HOI Detection
Mao, Yunyao
Deng, Jiajun
Zhou, Wengang
Li, Li
Fang, Yao
Li, Houqiang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[34] ANOMALY DETECTION IN EM IMAGES - A ZERO-SHOT LEARNING APPROACH
Mahalingam, Gayathri
Jiao, Tong
Schneider-Mizell, Casey
Bodor, Agnes
Torres, Russel
Takeno, Marc
Buchanan, JoAnn
Bumbarger, Daniel
Yin, Wenjing
Brittain, Derrick
Reid, Clay
Da Costa, Nuno
2022 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (IEEE ISBI 2022), 2022,
[35] Feature Enhanced Projection Network for Zero-shot Semantic Segmentation
Lu, Hongchao
Fang, Longwei
Lin, Matthieu
Deng, Zhidong
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 14011 - 14017
[36] Weakly supervised classification model for zero-shot semantic segmentation
Shen, Fengli
Wang, Zong-Hui
Lu, Zhe-Ming
ELECTRONICS LETTERS, 2020, 56 (23) : 1247 - 1249
[37] Zero-shot domain adaptation with enhanced consistency for semantic segmentation
Yang, Jiming
Da, Feipeng
Hong, Ru
Cai, Zeyu
Gai, Shaoyan
COMPUTERS & ELECTRICAL ENGINEERING, 2025, 123
[38] Advancing zero-shot semantic segmentation through attribute correlations
Zhang, Runtong
Meng, Fanman
Chen, Shuai
Wu, Qingbo
Xu, Linfeng
Li, Hongliang
NEUROCOMPUTING, 2024, 594
[39] TagCLIP: Improving Discrimination Ability of Zero-Shot Semantic Segmentation
Li, Jingyao
Chen, Pengguang
Qian, Shengju
Liu, Shu
Jia, Jiaya
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 11287 - 11297
[40] Bidirectional Mask Selection for Zero-Shot Referring Image Segmentation
Li, Wenhui
Pang, Chao
Nie, Weizhi
Tian, Hongshuo
Liu, An-An
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 911 - 921

← 1 2 3 4 5 →