Delving into CLIP latent space for Video Anomaly Recognition

被引:1
|
作者
Zanella, Luca [1 ]
Liberatori, Benedetta [1 ]
Menapace, Willi [1 ]
Poiesi, Fabio [2 ]
Wang, Yiming [2 ]
Ricci, Elisa [1 ,2 ]
机构
[1] Univ Trento, Trento, Italy
[2] Fdn Bruno Kessler, Trento, Italy
关键词
Video anomaly detection and recognition; Multi-modal learning;
D O I
10.1016/j.cviu.2024.104163
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We tackle the complex problem of detecting and recognising anomalies in surveillance videos at the frame level, utilising only video-level supervision. We introduce the novel method AnomalyCLIP, , the first to combine Vision and Language Models (VLMs), such as CLIP, with multiple instance learning for joint video anomaly detection and classification. Our approach specifically involves manipulating the latent CLIP feature space to identify the normal event subspace, which in turn allows us to effectively learn text-driven directions for abnormal events. When anomalous frames are projected onto these directions, they exhibit a large feature magnitude if they belong to a particular class. We also leverage a computationally efficient Transformer architecture to model short- and long-term temporal dependencies between frames, ultimately producing the final anomaly score and class prediction probabilities. We compare AnomalyCLIP against state-of-the-art methods considering three major anomaly detection benchmarks, i.e. ShanghaiTech, UCF-Crime, and XD- Violence, and empirically show that it outperforms baselines in recognising video anomalies. Project website and code are available at https://lucazanella.github.io/AnomalyCLIP/.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Delving into the Openness of CLIP
    Ren, Shuhuai
    Li, Lei
    Ren, Xuancheng
    Zhao, Guangxiang
    Sun, Xu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9587 - 9606
  • [2] Anomaly Detection in Video Data Based on Probabilistic Latent Space Models
    Slavic, Giulia
    Campo, Damian
    Baydoun, Mohamad
    Marin, Pablo
    Martin, David
    Marcenaro, Lucio
    Regazzoni, Carlo
    2020 IEEE INTERNATIONAL CONFERENCE ON EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS (EAIS), 2020,
  • [3] DISCRIMINATIVE CLIP MINING FOR VIDEO ANOMALY DETECTION
    Sun, Li
    Chen, Yanjun
    Luo, Wu
    Wu, Haiyan
    Zhang, Chongyang
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2121 - 2125
  • [4] Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and Anticipation
    Cao, Congqi
    Zhang, Hanwen
    Lu, Yue
    Wang, Peng
    Zhang, Yanning
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (01) : 224 - 239
  • [5] Delving into Details: Synopsis-to-Detail Networks for Video Recognition
    Liang, Shuxian
    Shen, Xu
    Huang, Jianqiang
    Hua, Xian-Sheng
    COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 262 - 278
  • [6] Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint
    Liu, Hongyu
    Song, Yibing
    Chen, Qifeng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10072 - 10082
  • [7] Goal Recognition in Latent Space
    Amado, Leonardo
    Pereira, Ramon Fraga
    Aires, Joao
    Magnaguagno, Mauricio
    Granada, Roger
    Meneguzzi, Felipe
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [8] Enhancing Latent Features for Unsupervised Video Anomaly Detection
    Zhou, Linmao
    Chang, Hong
    Kang, Nan
    Zhao, Xiangjun
    Ma, Bingpeng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 299 - 310
  • [9] RCAT: Retentive CLIP Adapter Tuning for Improved Video Recognition
    Xie, Zexun
    Xu, Min
    Zhang, Shudong
    Zhou, Lijuan
    ELECTRONICS, 2024, 13 (05)
  • [10] Video clip representation and recognition using composite shot models
    Yang, XF
    Tian, Q
    Gao, S
    ICICS-PCM 2003, VOLS 1-3, PROCEEDINGS, 2003, : 1566 - 1570