ClipSAM: CLIP and SAM collaboration for zero-shot anomaly segmentation

被引:0
|
作者
Li, Shengze [1 ]
Cao, Jianjian [1 ]
Ye, Peng [1 ]
Ding, Yuhan [1 ]
Tu, Chongjun [1 ]
Chen, Tao [1 ]
机构
[1] Fudan Univ, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Zero-shot anomaly segmentation; CLIP and SAM collaboration; Cross-modal interaction; NETWORK;
D O I
10.1016/j.neucom.2024.129122
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Zero-Shot Anomaly Segmentation (ZSAS) aims to segment anomalies without any training data related to the test samples. Recently, while foundational models like CLIP and SAM have shown potential for ZSAS, existing approaches leveraging either CLIP or SAM individually encounter critical limitations: (1) CLIP emphasizes global feature alignment across different inputs, leading to imprecise segmentation of local anomalous parts; and (2) SAM tends to generate numerous redundant masks without proper prompt constraints, resulting in complex post-processing requirements. In this paper, we introduce ClipSAM, a novel collaborative framework that integrates CLIP and SAM to address these issues in ZSAS. The insight behind ClipSAM is to employ CLIP's semantic understanding capability for anomaly localization and rough segmentation, which is further used as the prompt constraints for SAM to refine the anomaly segmentation results. Specifically, we propose a Unified Multi-scale Cross-modal Interaction (UMCI) module that learns local and global semantics about anomalous parts by interacting language features with visual features at both row-column and multi-scale levels, effectively reasoning about anomaly positions. Additionally, we develop a Multi-level Mask Refinement (MMR) module that guides SAM's output through multi-level spatial prompts derived from CLIP's localization, progressively merging the masks to refine the segmentation results. Extensive experiments validate the effectiveness of our approach, achieving the optimal segmentation performance on the MVTec-AD and VisA datasets.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Robust Zero-Shot Crowd Counting and Localization With Adaptive Resolution SAM
    Wan, Jia
    Wu, Qiangqiang
    Lin, Wei
    Chan, Antoni
    COMPUTER VISION-ECCV 2024, PT LVII, 2025, 15115 : 478 - 495
  • [32] No More Training: SAM's Zero-Shot Transfer Capabilities for Cost-Efficient Medical Image Segmentation
    Gutierrez, Juan D.
    Rodriguez-Echeverria, Roberto
    Delgado, Emilio
    Rodrigo, Miguel Angel Suero
    Sanchez-Figueroa, Fernando
    IEEE ACCESS, 2024, 12 : 24205 - 24216
  • [33] CLIP4HOI: Towards Adapting CLIP for Practical Zero-Shot HOI Detection
    Mao, Yunyao
    Deng, Jiajun
    Zhou, Wengang
    Li, Li
    Fang, Yao
    Li, Houqiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [34] ANOMALY DETECTION IN EM IMAGES - A ZERO-SHOT LEARNING APPROACH
    Mahalingam, Gayathri
    Jiao, Tong
    Schneider-Mizell, Casey
    Bodor, Agnes
    Torres, Russel
    Takeno, Marc
    Buchanan, JoAnn
    Bumbarger, Daniel
    Yin, Wenjing
    Brittain, Derrick
    Reid, Clay
    Da Costa, Nuno
    2022 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (IEEE ISBI 2022), 2022,
  • [35] Feature Enhanced Projection Network for Zero-shot Semantic Segmentation
    Lu, Hongchao
    Fang, Longwei
    Lin, Matthieu
    Deng, Zhidong
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 14011 - 14017
  • [36] Weakly supervised classification model for zero-shot semantic segmentation
    Shen, Fengli
    Wang, Zong-Hui
    Lu, Zhe-Ming
    ELECTRONICS LETTERS, 2020, 56 (23) : 1247 - 1249
  • [37] Zero-shot domain adaptation with enhanced consistency for semantic segmentation
    Yang, Jiming
    Da, Feipeng
    Hong, Ru
    Cai, Zeyu
    Gai, Shaoyan
    COMPUTERS & ELECTRICAL ENGINEERING, 2025, 123
  • [38] Advancing zero-shot semantic segmentation through attribute correlations
    Zhang, Runtong
    Meng, Fanman
    Chen, Shuai
    Wu, Qingbo
    Xu, Linfeng
    Li, Hongliang
    NEUROCOMPUTING, 2024, 594
  • [39] TagCLIP: Improving Discrimination Ability of Zero-Shot Semantic Segmentation
    Li, Jingyao
    Chen, Pengguang
    Qian, Shengju
    Liu, Shu
    Jia, Jiaya
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 11287 - 11297
  • [40] Bidirectional Mask Selection for Zero-Shot Referring Image Segmentation
    Li, Wenhui
    Pang, Chao
    Nie, Weizhi
    Tian, Hongshuo
    Liu, An-An
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 911 - 921