AVForensics: Audio-driven Deepfake Video Detection with Masking Strategy in Self-supervision

被引：3

作者：

Zhu Yizhe ^{[1
]}

Gao Jialin ^{[2
]}

Zhou Xi ^{[3
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] Natl Univ Singapore, Singapore, Singapore

[3] CloudWalk Technol, Shanghai, Peoples R China

来源：

PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023 | 2023年

关键词：

Deepfake detection; audio-visual; masking strategy; self-supervision;

D O I：

10.1145/3591106.3592218

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing cross-dataset deepfake detection approaches exploit mouth-related mismatches between the auditory and visual modalities in fake videos to enhance generalisation to unseen forgeries. However, such methods inevitably suffer performance degradation with limited or unaltered mouth motions, we argue that face forgery detection consistently benefits from using high-level cues across the whole face region. In this paper, we propose a two-phase audio-driven multi-modal transformer-based framework, termed AVForensics, to perform deepfake video content detection from an audio-visual matching view related to full face. In the first pretraining phase, we apply the novel uniform masking strategy to model global facial features and learn temporally dense video representations in a self-supervised cross-modal manner, by capturing the natural correspondence between the visual and auditory modalities regardless of large-scaled labelled data and heavy memory usage. Then we use these learned representations to fine-tune for the down-stream deepfake detection task in the second phase, which encourages the model to offer accurate predictions based on captured global facial movement features. Extensive experiments and visualizations on various public datasets demonstrate the superiority of our self-supervised pre-trained method for achieving generalisable and robust deepfake video detection.

引用

页码：162 / 171

页数：10

共 50 条

[41] Self-supervision versus synthetic datasets: which is the lesser evil in the context of video denoising?
Dewil, Valery
Barral, Arnaud
Facciolo, Gabriele
Arias, Pablo
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4896 - 4906
[42] Detection of Critical Structures in Laparoscopic Cholecystectomy Using Label Relaxation and Self-supervision
Owen, David
Grammatikopoulou, Maria
Luengo, Imanol
Stoyanov, Danail
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT IV, 2021, 12904 : 321 - 330
[43] Geometry-driven self-supervision for 3D human pose estimation
Yang, Geon-Jun
Kim, Jun-Hee
Lee, Seong-Whan
NEURAL NETWORKS, 2024, 174
[44] Deep Crash Detection From Vehicular Sensor Data With Multimodal Self-Supervision
Kubin, Luca
Bianconcini, Tommaso
de Andrade, Douglas Coimbra
Simoncini, Matteo
Taccari, Leonardo
Sambo, Francesco
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (08) : 12480 - 12489
[45] Surface-Defect Detection Based on Feature Pyramid Matching and Self-Supervision
Liang Ming
Zhang Minglu
Lu Xiaoling
LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (04)
[46] Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection
Haliassos, Alexandros
Mira, Rodrigo
Petridis, Stavros
Pantic, Maja
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14930 - 14942
[47] ON THE TRANSFERABILITY OF LARGE-SCALE SELF-SUPERVISION TO FEW-SHOT AUDIO CLASSIFICATION<bold> </bold>
Heggan, Calum
Budgett, Sam
Hospedales, Tim
Yaghoobi, Mehrdad
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 515 - 519
[48] What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection
Zhang, Xiaohui
Yi, Jiangyan
Wang, Chenglong
Zhang, Chu Yuan
Zeng, Siding
Tao, Jianhua
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19569 - 19577
[49] An ensemble of CNNs with self-attention mechanism for DeepFake video detection
Karima Omar
Rasha H. Sakr
Mohammed F. Alrahmawy
Neural Computing and Applications, 2024, 36 : 2749 - 2765
[50] Improving Deepfake Video Detection with Comprehensive Self-consistency Learning
Bao, Heng
Deng, Lirui
Guan, Jiazhi
Zhang, Liang
Chen, Xunxun
CYBER SECURITY, CNCERT 2022, 2022, 1699 : 151 - 161

← 1 2 3 4 5 →