ClipSAM: CLIP and SAM collaboration for zero-shot anomaly segmentation

被引:0
|
作者
Li, Shengze [1 ]
Cao, Jianjian [1 ]
Ye, Peng [1 ]
Ding, Yuhan [1 ]
Tu, Chongjun [1 ]
Chen, Tao [1 ]
机构
[1] Fudan Univ, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Zero-shot anomaly segmentation; CLIP and SAM collaboration; Cross-modal interaction; NETWORK;
D O I
10.1016/j.neucom.2024.129122
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Zero-Shot Anomaly Segmentation (ZSAS) aims to segment anomalies without any training data related to the test samples. Recently, while foundational models like CLIP and SAM have shown potential for ZSAS, existing approaches leveraging either CLIP or SAM individually encounter critical limitations: (1) CLIP emphasizes global feature alignment across different inputs, leading to imprecise segmentation of local anomalous parts; and (2) SAM tends to generate numerous redundant masks without proper prompt constraints, resulting in complex post-processing requirements. In this paper, we introduce ClipSAM, a novel collaborative framework that integrates CLIP and SAM to address these issues in ZSAS. The insight behind ClipSAM is to employ CLIP's semantic understanding capability for anomaly localization and rough segmentation, which is further used as the prompt constraints for SAM to refine the anomaly segmentation results. Specifically, we propose a Unified Multi-scale Cross-modal Interaction (UMCI) module that learns local and global semantics about anomalous parts by interacting language features with visual features at both row-column and multi-scale levels, effectively reasoning about anomaly positions. Additionally, we develop a Multi-level Mask Refinement (MMR) module that guides SAM's output through multi-level spatial prompts derived from CLIP's localization, progressively merging the masks to refine the segmentation results. Extensive experiments validate the effectiveness of our approach, achieving the optimal segmentation performance on the MVTec-AD and VisA datasets.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Enhancing Zero-Shot Anomaly Detection: CLIP-SAM Collaboration with Cascaded Prompts
    Hou, Yanning
    Xu, Ke
    Li, Junfa
    Ruan, Yanran
    Qiu, Jianfeng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT II, 2025, 15032 : 46 - 60
  • [2] VCP-CLIP: A Visual Context Prompting Model for Zero-Shot Anomaly Segmentation
    Qu, Zhen
    Tao, Xian
    Prasad, Mukesh
    Shen, Fei
    Zhang, Zhengtao
    Gong, Xinyi
    Ding, Guiguang
    COMPUTER VISION - ECCV 2024, PT LXIX, 2025, 15127 : 301 - 317
  • [3] Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero-shot Medical Image Segmentation
    Aleem, Sidra
    Wang, Fangyijie
    Maniparambil, Mayug
    Arazo, Eric
    Dietlmeier, Julia
    Curran, Kathleen
    O'Connor, Noel E.
    Little, Suzanne
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2024, : 5184 - 5193
  • [4] ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation
    Zhou, Ziqin
    Lei, Yinjie
    Zhano, Bowen
    Liu, Lingqiao
    Liu, Yifan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11175 - 11185
  • [5] Learning Mask-aware CLIP Representations for Zero-Shot Segmentation
    Jiao, Siyu
    Wei, Yunchao
    Wang, Yaowei
    Zhao, Yao
    Shi, Humphrey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Online Zero-Shot Classification with CLIP
    Qian, Qi
    Hu, Juhua
    COMPUTER VISION - ECCV 2024, PT LXXVII, 2024, 15135 : 462 - 477
  • [7] AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection
    Cao, Yunkang
    Zhang, Jiangning
    Frittoli, Luca
    Cheng, Yuqi
    Shen, Weiming
    Boracchi, Giacomo
    COMPUTER VISION-ECCV 2024, PT XXXV, 2025, 15093 : 55 - 72
  • [8] Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation
    Wang, Yuanbin
    Huang, Shaofei
    Gao, Yulu
    Wang, Zhen
    Wang, Rui
    Sheng, Kehua
    Zhang, Bo
    Liu, Si
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3745 - 3754
  • [9] Zero-Shot Segmentation of Eye Features Using the Segment Anything Model (SAM)
    Maquiling, Virmarie
    Byrne, Sean Anthony
    Niehorster, Diederick C.
    Nystrom, Marcus
    Kasneci, Enkelejda
    PROCEEDINGS OF THE ACM ON COMPUTER GRAPHICS AND INTERACTIVE TECHNIQUES, 2024, 7 (02)
  • [10] Zero-Shot Instance Segmentation
    Zheng, Ye
    Wu, Jiahong
    Qin, Yongqiang
    Zhang, Faen
    Cui, Li
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2593 - 2602