Audiovisual Dependency Attention for Violence Detection in Videos

被引:4
|
作者
Pang, Wenfeng [1 ]
Xie, Wei [1 ]
He, Qianhua [1 ]
Li, Yanxiong [1 ]
Yang, Jichen [2 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510640, Peoples R China
[2] Guangdong Polytech Normal Univ, Sch Cyberspace Secur, Speech Informat Secur Lab, Guangzhou 510640, Peoples R China
基金
中国国家自然科学基金;
关键词
Audiovisual dependency attention; dependency map; violence detection; SCENES; MOVIES; FUSION;
D O I
10.1109/TMM.2022.3184533
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Violence detection in videos can help maintain public order, detect crimes, or provide timely assistance. In this paper, we aim to leverage multimodal information to determine whether successive frames contain violence. Specifically, we propose an audiovisual dependency attention (AVD-attention) module modified from the co-attention architecture to fuse visual and audio information, unlike commonly used methods such as the feature concatenation, addition, and score fusion. Because the AVD-attention module's dependency map contains sufficient fusion information, we argue that it should be applied more sufficiently. A combination pooling method is utilized to convert the dependency map to an attention vector, which can be considered a new feature that includes fusion information or a mask of the attention feature map. Since some information in the input feature might be lost after processing by attention modules, we employ a multimodal low-rank bilinear method that considers all pairwise interactions among two features in each time step to complement the original information for output features of the module. AVD-attention outperformed co-attention in experiments on the XD-Violence dataset. Our system outperforms state-of-the-art systems.
引用
收藏
页码:4922 / 4932
页数:11
相关论文
共 50 条
  • [31] VIOLENCE IN MUSIC VIDEOS
    PERRATON, C
    REVUE D ESTHETIQUE, 1986, (10): : 135 - 137
  • [32] The Audiovisual Field in Bruce Nauman's Videos
    Schaefer, Armin
    OSIRIS, 2013, 28 (01) : 146 - 161
  • [33] Trigger videos on the web: Impact of audiovisual design
    CED-Groep, Rotterdam, Netherlands
    不详
    不详
    Br J Educ Technol, 4 (573-582):
  • [34] AUDIOVISUAL CELEBRITY RECOGNITION IN UNCONSTRAINED WEB VIDEOS
    Sargin, Mehmet Emre
    Aradhye, Hrishikesh
    Moreno, Pedro J.
    Zhao, Ming
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 1977 - +
  • [35] Violence Detection in Videos using Deep Recurrent and Convolutional Neural Networks
    Traore, Abdarahmane
    Akhloufi, Moulay A.
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 154 - 159
  • [36] Violence Detection From Industrial Surveillance Videos Using Deep Learning
    Khan, Hamza
    Yuan, Xiaohong
    Qingge, Letu
    Roy, Kaushik
    IEEE ACCESS, 2025, 13 : 15363 - 15375
  • [37] FTCF: Full temporal cross fusion network for violence detection in videos
    Tan Zhenhua
    Xia Zhenche
    Wang Pengfei
    Ding Chang
    Zhai Weichao
    APPLIED INTELLIGENCE, 2023, 53 (04) : 4218 - 4230
  • [38] Violence detection in crowd videos using nuanced facial expression analysis
    Sreenu, G.
    Durai, M. A. Saleem
    SYSTEMS AND SOFT COMPUTING, 2024, 6
  • [39] Feature Fusion Based Deep Spatiotemporal Model for Violence Detection in Videos
    Asad, Mujtaba
    Yang, Zuopeng
    Khan, Zubair
    Yang, Jie
    He, Xiangjian
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT I, 2019, 11953 : 405 - 417
  • [40] Violence Detection in Surveillance Videos with Deep Network using Transfer Learning
    Mumtaz, Aqib
    Sargano, Allah Bux
    Habib, Zulfiqar
    2018 2ND EUROPEAN CONFERENCE ON ELECTRICAL ENGINEERING AND COMPUTER SCIENCE (EECS 2018), 2018, : 558 - 563