AVForensics: Audio-driven Deepfake Video Detection with Masking Strategy in Self-supervision

被引：3

作者：

Zhu Yizhe ^{[1
]}

Gao Jialin ^{[2
]}

Zhou Xi ^{[3
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] Natl Univ Singapore, Singapore, Singapore

[3] CloudWalk Technol, Shanghai, Peoples R China

来源：

PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023 | 2023年

关键词：

Deepfake detection; audio-visual; masking strategy; self-supervision;

D O I：

10.1145/3591106.3592218

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing cross-dataset deepfake detection approaches exploit mouth-related mismatches between the auditory and visual modalities in fake videos to enhance generalisation to unseen forgeries. However, such methods inevitably suffer performance degradation with limited or unaltered mouth motions, we argue that face forgery detection consistently benefits from using high-level cues across the whole face region. In this paper, we propose a two-phase audio-driven multi-modal transformer-based framework, termed AVForensics, to perform deepfake video content detection from an audio-visual matching view related to full face. In the first pretraining phase, we apply the novel uniform masking strategy to model global facial features and learn temporally dense video representations in a self-supervised cross-modal manner, by capturing the natural correspondence between the visual and auditory modalities regardless of large-scaled labelled data and heavy memory usage. Then we use these learned representations to fine-tune for the down-stream deepfake detection task in the second phase, which encourages the model to offer accurate predictions based on captured global facial movement features. Extensive experiments and visualizations on various public datasets demonstrate the superiority of our self-supervised pre-trained method for achieving generalisable and robust deepfake video detection.

引用

页码：162 / 171

页数：10

共 50 条

[1] PVASS-MDD: Predictive Visual-Audio Alignment Self-Supervision for Multimodal Deepfake Detection
Yu, Yang
Liu, Xiaolong
Ni, Rongrong
Yang, Siyuan
Zhao, Yao
Kot, Alex C.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 6926 - 6936
[2] Photorealistic Audio-driven Video Portraits
Wen, Xin
Wang, Miao
Richardt, Christian
Chen, Ze-Yin
Hu, Shi-Min
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2020, 26 (12) : 3457 - 3466
[3] Audio-Driven Emotional Video Portraits
Ji, Xinya
Zhou, Hang
Wang, Kaisiyuan
Wu, Wayne
Loy, Chen Change
Cao, Xun
Xu, Feng
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14075 - 14084
[4] DEEP VIDEO INPAINTING GUIDED BY AUDIO-VISUAL SELF-SUPERVISION
Kim, Kyuyeon
Jung, Junsik
Kim, Woo Jae
Yoon, Sung-Eui
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1970 - 1974
[5] Audio-Driven Talking Video Frame Restoration
Cheng, Harry
Guo, Yangyang
Yin, Jianhua
Chen, Haonan
Wang, Jiafang
Nie, Liqiang
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4110 - 4122
[6] ASVFI: AUDIO-DRIVEN SPEAKER VIDEO FRAME INTERPOLATION
Wang, Qianrui
Li, Dengshi
Liao, Liang
Song, Hao
Li, Wei
Xiao, Jing
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3200 - 3204
[7] Audio-driven Talking Face Video Generation with Emotion
Liang, Jiadong
Lu, Feng
2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 863 - 864
[8] Pre-Training Audio Representations With Self-Supervision
Tagliasacchi, Marco
Gfeller, Beat
Quitry, Felix de Chaumont
Roblek, Dominik
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 600 - 604
[9] Learning to Remove Rain in Video With Self-Supervision
Yang, Wenhan
Tan, Robby T.
Wang, Shiqi
Kot, Alex C.
Liu, Jiaying
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) : 1378 - 1396
[10] Audio-Driven Co-Speech Gesture Video Generation
Liu, Xian
Wu, Qianyi
Zhou, Hang
Du, Yuanqi
Wu, Wayne
Lin, Dahua
Liu, Ziwei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,

← 1 2 3 4 5 →