Scale-Aware Spatio-Temporal Relation Learning for Video Anomaly Detection

被引:15
|
作者
Li, Guoqiu [1 ]
Cai, Guanxiong [2 ]
Zeng, Xingyu [2 ]
Zhao, Rui [2 ,3 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] SenseTime Res, Shanghai, Peoples R China
[3] Shanghai Jiao Tong Univ, Qing Yuan Res Inst, Shanghai, Peoples R China
来源
关键词
Scale-aware; Weakly-supervised video anomaly detection; Spatio-temporal relation modeling;
D O I
10.1007/978-3-031-19772-7_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent progress in video anomaly detection (VAD) has shown that feature discrimination is the key to effectively distinguishing anomalies from normal events. We observe that many anomalous events occur in limited local regions, and the severe background noise increases the difficulty of feature learning. In this paper, we propose a scale-aware weakly supervised learning approach to capture local and salient anomalous patterns from the background, using only coarse video-level labels as supervision. We achieve this by segmenting frames into non-overlapping patches and then capturing inconsistencies among different regions through our patch spatial relation (PSR) module, which consists of self-attention mechanisms and dilated convolutions. To address the scale variation of anomalies and enhance the robustness of our method, a multi-scale patch aggregation method is further introduced to enable local-to-global spatial perception by merging features of patches with different scales. Considering the importance of temporal cues, we extend the relation modeling from the spatial domain to the spatio-temporal domain with the help of the existing video temporal relation network to effectively encode the spatio-temporal dynamics in the video. Experimental results show that our proposed method achieves new state-of-the-art performance on UCF-Crime and ShanghaiTech benchmarks. Code are available at https://github.com/nutuniv/SSRL.
引用
收藏
页码:333 / 350
页数:18
相关论文
共 50 条
  • [31] Exploring the Spatio-Temporal Aware Graph for video captioning
    Xue, Ping
    Zhou, Bing
    IET COMPUTER VISION, 2022, 16 (05) : 456 - 467
  • [32] Spatio-Temporal Scale Selection in Video Data
    Tony Lindeberg
    Journal of Mathematical Imaging and Vision, 2018, 60 : 525 - 562
  • [33] Spatio-Temporal Scale Selection in Video Data
    Lindeberg, Tony
    JOURNAL OF MATHEMATICAL IMAGING AND VISION, 2018, 60 (04) : 525 - 562
  • [34] Spatio-Temporal Scale Selection in Video Data
    Lindeberg, Tony
    SCALE SPACE AND VARIATIONAL METHODS IN COMPUTER VISION, SSVM 2017, 2017, 10302 : 3 - 15
  • [35] Interactive spatio-temporal feature learning network for video foreground detection
    Hongrui Zhang
    Huan Li
    Complex & Intelligent Systems, 2022, 8 : 4251 - 4263
  • [36] Dynamic Difference Learning With Spatio-Temporal Correlation for Deepfake Video Detection
    Yin, Qilin
    Lu, Wei
    Li, Bin
    Huang, Jiwu
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 4046 - 4058
  • [37] Interactive spatio-temporal feature learning network for video foreground detection
    Zhang, Hongrui
    Li, Huan
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (05) : 4251 - 4263
  • [38] VIDEO ANOMALY DETECTION VIA PREDICTION NETWORK WITH ENHANCED SPATIO-TEMPORAL MEMORY EXCHANGE
    Shen, Guodong
    Ouyang, Yuqi
    Sanchez, Victor
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3728 - 3732
  • [39] Video anomaly detection based on a hierarchical activity discovery within spatio-temporal contexts
    Xu, Dan
    Song, Rui
    Wu, Xinyu
    Li, Nannan
    Feng, Wei
    Qian, Huihuan
    NEUROCOMPUTING, 2014, 143 : 144 - 152
  • [40] Spatio-temporal graph-based self-labeling for video anomaly detection
    Xing, Meng
    Feng, Zhiyong
    Su, Yong
    Zhang, Yiming
    Oh, Changjae
    Gribova, Valeriya
    Filaretoy, Vladimir Fedorovich
    Huang, Deshuang
    NEUROCOMPUTING, 2025, 627