AVForensics: Audio-driven Deepfake Video Detection with Masking Strategy in Self-supervision

被引:3
|
作者
Zhu Yizhe [1 ]
Gao Jialin [2 ]
Zhou Xi [3 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Natl Univ Singapore, Singapore, Singapore
[3] CloudWalk Technol, Shanghai, Peoples R China
关键词
Deepfake detection; audio-visual; masking strategy; self-supervision;
D O I
10.1145/3591106.3592218
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing cross-dataset deepfake detection approaches exploit mouth-related mismatches between the auditory and visual modalities in fake videos to enhance generalisation to unseen forgeries. However, such methods inevitably suffer performance degradation with limited or unaltered mouth motions, we argue that face forgery detection consistently benefits from using high-level cues across the whole face region. In this paper, we propose a two-phase audio-driven multi-modal transformer-based framework, termed AVForensics, to perform deepfake video content detection from an audio-visual matching view related to full face. In the first pretraining phase, we apply the novel uniform masking strategy to model global facial features and learn temporally dense video representations in a self-supervised cross-modal manner, by capturing the natural correspondence between the visual and auditory modalities regardless of large-scaled labelled data and heavy memory usage. Then we use these learned representations to fine-tune for the down-stream deepfake detection task in the second phase, which encourages the model to offer accurate predictions based on captured global facial movement features. Extensive experiments and visualizations on various public datasets demonstrate the superiority of our self-supervised pre-trained method for achieving generalisable and robust deepfake video detection.
引用
收藏
页码:162 / 171
页数:10
相关论文
共 50 条
  • [41] Self-supervision versus synthetic datasets: which is the lesser evil in the context of video denoising?
    Dewil, Valery
    Barral, Arnaud
    Facciolo, Gabriele
    Arias, Pablo
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4896 - 4906
  • [42] Detection of Critical Structures in Laparoscopic Cholecystectomy Using Label Relaxation and Self-supervision
    Owen, David
    Grammatikopoulou, Maria
    Luengo, Imanol
    Stoyanov, Danail
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT IV, 2021, 12904 : 321 - 330
  • [43] Geometry-driven self-supervision for 3D human pose estimation
    Yang, Geon-Jun
    Kim, Jun-Hee
    Lee, Seong-Whan
    NEURAL NETWORKS, 2024, 174
  • [44] Deep Crash Detection From Vehicular Sensor Data With Multimodal Self-Supervision
    Kubin, Luca
    Bianconcini, Tommaso
    de Andrade, Douglas Coimbra
    Simoncini, Matteo
    Taccari, Leonardo
    Sambo, Francesco
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (08) : 12480 - 12489
  • [45] Surface-Defect Detection Based on Feature Pyramid Matching and Self-Supervision
    Liang Ming
    Zhang Minglu
    Lu Xiaoling
    LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (04)
  • [46] Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection
    Haliassos, Alexandros
    Mira, Rodrigo
    Petridis, Stavros
    Pantic, Maja
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14930 - 14942
  • [47] ON THE TRANSFERABILITY OF LARGE-SCALE SELF-SUPERVISION TO FEW-SHOT AUDIO CLASSIFICATION<bold> </bold>
    Heggan, Calum
    Budgett, Sam
    Hospedales, Tim
    Yaghoobi, Mehrdad
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 515 - 519
  • [48] What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection
    Zhang, Xiaohui
    Yi, Jiangyan
    Wang, Chenglong
    Zhang, Chu Yuan
    Zeng, Siding
    Tao, Jianhua
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19569 - 19577
  • [49] An ensemble of CNNs with self-attention mechanism for DeepFake video detection
    Karima Omar
    Rasha H. Sakr
    Mohammed F. Alrahmawy
    Neural Computing and Applications, 2024, 36 : 2749 - 2765
  • [50] Improving Deepfake Video Detection with Comprehensive Self-consistency Learning
    Bao, Heng
    Deng, Lirui
    Guan, Jiazhi
    Zhang, Liang
    Chen, Xunxun
    CYBER SECURITY, CNCERT 2022, 2022, 1699 : 151 - 161