Audiovisual Dependency Attention for Violence Detection in Videos

被引:4
|
作者
Pang, Wenfeng [1 ]
Xie, Wei [1 ]
He, Qianhua [1 ]
Li, Yanxiong [1 ]
Yang, Jichen [2 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510640, Peoples R China
[2] Guangdong Polytech Normal Univ, Sch Cyberspace Secur, Speech Informat Secur Lab, Guangzhou 510640, Peoples R China
基金
中国国家自然科学基金;
关键词
Audiovisual dependency attention; dependency map; violence detection; SCENES; MOVIES; FUSION;
D O I
10.1109/TMM.2022.3184533
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Violence detection in videos can help maintain public order, detect crimes, or provide timely assistance. In this paper, we aim to leverage multimodal information to determine whether successive frames contain violence. Specifically, we propose an audiovisual dependency attention (AVD-attention) module modified from the co-attention architecture to fuse visual and audio information, unlike commonly used methods such as the feature concatenation, addition, and score fusion. Because the AVD-attention module's dependency map contains sufficient fusion information, we argue that it should be applied more sufficiently. A combination pooling method is utilized to convert the dependency map to an attention vector, which can be considered a new feature that includes fusion information or a mask of the attention feature map. Since some information in the input feature might be lost after processing by attention modules, we employ a multimodal low-rank bilinear method that considers all pairwise interactions among two features in each time step to complement the original information for output features of the module. AVD-attention outperformed co-attention in experiments on the XD-Violence dataset. Our system outperforms state-of-the-art systems.
引用
收藏
页码:4922 / 4932
页数:11
相关论文
共 50 条
  • [21] Detection of Deepfake Videos Using Long-Distance Attention
    Lu, Wei
    Liu, Lingyi
    Zhang, Bolin
    Luo, Junwei
    Zhao, Xianfeng
    Zhou, Yicong
    Huang, Jiwu
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (07) : 9366 - 9379
  • [22] TempNet: Temporal Attention Towards the Detection of Animal Behaviour in Videos
    McIntosh, Declan
    Marques, Tunai Porto
    Albu, Alexandra Branzan
    Rountree, Rodney
    De Leo, Fabio
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3318 - 3324
  • [23] Detection and Attention for Auditory, Visual, and Audiovisual Speech in Children with Hearing Loss
    Jerger, Susan
    Damian, Markus F.
    Karl, Cassandra
    Abdi, Herve
    EAR AND HEARING, 2020, 41 (03): : 508 - 520
  • [24] Deep anomaly detection through visual attention in surveillance videos
    Nasaruddin Nasaruddin
    Kahlil Muchtar
    Afdhal Afdhal
    Alvin Prayuda Juniarta Dwiyantoro
    Journal of Big Data, 7
  • [25] Multimodal-Attention Fusion for the Detection of Questionable Content in Videos
    Morales, Arnold
    Baharlouei, Elaheh
    Solorio, Thamar
    Escalante, Hugo Jair
    PATTERN RECOGNITION, MCPR 2024, 2024, 14755 : 188 - 199
  • [26] Violence mirrored in audiovisual culture
    Chelysheva, I. V.
    MEDIAOBRAZOVANIE-MEDIA EDUCATION, 2006, (03): : 124 - 125
  • [27] Dependency Structure -Enhanced Graph Attention Networks for Event Detection
    Wan, Qizhi
    Wan, Changxuan
    Xiao, Keli
    Lu, Kun
    Li, Chenliang
    Liu, Xiping
    Liu, Dexi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19098 - 19106
  • [28] VIDEOS AND VIOLENCE ON THE PERIPHERY
    RICHARDS, P
    IDS BULLETIN-INSTITUTE OF DEVELOPMENT STUDIES, 1994, 25 (02): : 88 - 93
  • [29] FunnyNet: Audiovisual Learning of Funny Moments in Videos
    Liu, Zhi-Song
    Courant, Robin
    Kalogeiton, Vicky
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 433 - 450
  • [30] Trigger videos on the Web: Impact of audiovisual design
    Verleur, Ria
    Heuvelman, Ard
    Verhagen, Plon W.
    BRITISH JOURNAL OF EDUCATIONAL TECHNOLOGY, 2011, 42 (04) : 573 - 582