Audiovisual Dependency Attention for Violence Detection in Videos

被引:4
|
作者
Pang, Wenfeng [1 ]
Xie, Wei [1 ]
He, Qianhua [1 ]
Li, Yanxiong [1 ]
Yang, Jichen [2 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510640, Peoples R China
[2] Guangdong Polytech Normal Univ, Sch Cyberspace Secur, Speech Informat Secur Lab, Guangzhou 510640, Peoples R China
基金
中国国家自然科学基金;
关键词
Audiovisual dependency attention; dependency map; violence detection; SCENES; MOVIES; FUSION;
D O I
10.1109/TMM.2022.3184533
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Violence detection in videos can help maintain public order, detect crimes, or provide timely assistance. In this paper, we aim to leverage multimodal information to determine whether successive frames contain violence. Specifically, we propose an audiovisual dependency attention (AVD-attention) module modified from the co-attention architecture to fuse visual and audio information, unlike commonly used methods such as the feature concatenation, addition, and score fusion. Because the AVD-attention module's dependency map contains sufficient fusion information, we argue that it should be applied more sufficiently. A combination pooling method is utilized to convert the dependency map to an attention vector, which can be considered a new feature that includes fusion information or a mask of the attention feature map. Since some information in the input feature might be lost after processing by attention modules, we employ a multimodal low-rank bilinear method that considers all pairwise interactions among two features in each time step to complement the original information for output features of the module. AVD-attention outperformed co-attention in experiments on the XD-Violence dataset. Our system outperforms state-of-the-art systems.
引用
收藏
页码:4922 / 4932
页数:11
相关论文
共 50 条
  • [41] FTCF: Full temporal cross fusion network for violence detection in videos
    Tan Zhenhua
    Xia Zhenche
    Wang Pengfei
    Ding Chang
    Zhai Weichao
    Applied Intelligence, 2023, 53 : 4218 - 4230
  • [42] Revisiting vision-based violence detection in videos: A critical analysis
    Kaur, Gurmeet
    Singh, Sarbjeet
    NEUROCOMPUTING, 2024, 597
  • [43] Transformer and Adaptive Threshold Sliding Window for Improving Violence Detection in Videos
    Rendon-Segador, Fernando J.
    Alvarez-Garcia, Juan A.
    Soria-Morillo, Luis M.
    SENSORS, 2024, 24 (16)
  • [44] Trajectory-Pooled Deep Convolutional Networks for Violence Detection in Videos
    Meng, Zihan
    Yuan, Jiabin
    Li, Zhen
    COMPUTER VISION SYSTEMS, ICVS 2017, 2017, 10528 : 437 - 447
  • [45] Real-Time Violence Detection in Videos Using Dynamic Images
    Marques Guedes, Ademir Rafael
    Chavez, Guillermo Camara
    2020 XLVI LATIN AMERICAN COMPUTING CONFERENCE (CLEI 2020), 2021, : 503 - 511
  • [46] A Comprehensive Review on Vision-Based Violence Detection in Surveillance Videos
    Ullah, Fath U. Min
    Obaidat, Mohammad S.
    Ullah, Amin
    Muhammad, Khan
    Hijji, Mohammad
    Baik, Sung Wook
    ACM COMPUTING SURVEYS, 2023, 55 (10)
  • [47] Summarizing Videos with Attention
    Fajtl, Jiri
    Sokeh, Hajar Sadeghi
    Argyriou, Vasileios
    Monekosso, Dorothy
    Remagnino, Paolo
    COMPUTER VISION - ACCV 2018 WORKSHOPS, 2019, 11367 : 39 - 54
  • [48] AUDIOVISUAL TEST OF DIVIDED ATTENTION
    BACH, MJ
    BRUCE, DL
    PERCEPTUAL AND MOTOR SKILLS, 1976, 43 (01) : 51 - 57
  • [49] Audiovisual interaction for spatial attention
    Shioiri, Satoshi
    Ono, Shin
    Wu, Wei
    Sakamoto, Shuichi
    Teraoka, Ryo
    Sato, Yoshiyuki
    Hatori, Yasuhiro
    Tseng, Chia-huei
    Kuriki, Ichiro
    PERCEPTION, 2021, 50 (1_SUPPL) : 41 - 41
  • [50] Influence-Aware Attention Networks for Anomaly Detection in Surveillance Videos
    Zhang, Sijia
    Gong, Maoguo
    Xie, Yu
    Qin, A. K.
    Li, Hao
    Gao, Yuan
    Ong, Yew-Soon
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5427 - 5437