AVForensics: Audio-driven Deepfake Video Detection with Masking Strategy in Self-supervision

被引：3

作者：

Zhu Yizhe ^{[1
]}

Gao Jialin ^{[2
]}

Zhou Xi ^{[3
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] Natl Univ Singapore, Singapore, Singapore

[3] CloudWalk Technol, Shanghai, Peoples R China

来源：

PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023 | 2023年

关键词：

Deepfake detection; audio-visual; masking strategy; self-supervision;

D O I：

10.1145/3591106.3592218

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing cross-dataset deepfake detection approaches exploit mouth-related mismatches between the auditory and visual modalities in fake videos to enhance generalisation to unseen forgeries. However, such methods inevitably suffer performance degradation with limited or unaltered mouth motions, we argue that face forgery detection consistently benefits from using high-level cues across the whole face region. In this paper, we propose a two-phase audio-driven multi-modal transformer-based framework, termed AVForensics, to perform deepfake video content detection from an audio-visual matching view related to full face. In the first pretraining phase, we apply the novel uniform masking strategy to model global facial features and learn temporally dense video representations in a self-supervised cross-modal manner, by capturing the natural correspondence between the visual and auditory modalities regardless of large-scaled labelled data and heavy memory usage. Then we use these learned representations to fine-tune for the down-stream deepfake detection task in the second phase, which encourages the model to offer accurate predictions based on captured global facial movement features. Extensive experiments and visualizations on various public datasets demonstrate the superiority of our self-supervised pre-trained method for achieving generalisable and robust deepfake video detection.

引用

页码：162 / 171

页数：10

共 50 条

[31] Learning Spatiotemporal 3D Convolution with Video Order Self-supervision
Suzuki, Tomoyuki
Itazuri, Takahiro
Hara, Kensho
Kataoka, Hirokatsu
COMPUTER VISION - ECCV 2018 WORKSHOPS, PT II, 2019, 11130 : 590 - 598
[32] Cervix type detection using a self-supervision boosted object detection technique
Bijoy, M. B.
Akondi, Sai Manoj
Fathaah, S. Abdul
Raut, Akash
Pournami, P. N.
Jayaraj, P. B.
INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2022, 32 (05) : 1615 - 1630
[33] TRICYCLE: AUDIO REPRESENTATION LEARNING FROM SENSOR NETWORK DATA USING SELF-SUPERVISION
Cartwright, Mark
Cramer, Jason
Salamon, Justin
Bello, Juan Pablo
2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 278 - 282
[34] Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training
Zhan, Li-Ming
Liang, Haowen
Liu, Bo
Fan, Lu
Wu, Xiao-Ming
Lam, Albert Y. S.
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3521 - 3532
[35] Enhancing Ethereum Fraud Detection via Generative and Contrastive Self-Supervision
Jin, Chengxiang
Zhou, Jiajun
Xie, Chenxuan
Yu, Shanqing
Xuan, Qi
Yang, Xiaoniu
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2025, 20 : 839 - 853
[36] Equivariant Spatio-temporal Self-supervision for LiDAR Object Detection
Hegde, Deepti
Lohit, Suhas
Peng, Kuan-Chuan
Jones, Michael J.
Patel, Vishal M.
COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 475 - 491
[37] Self-supervision Meets Adversarial Perturbation: A Novel Framework for Anomaly Detection
Wang, Yizhou
Qin, Can
Wei, Rongzhe
Xu, Yi
Bai, Yue
Fu, Yun
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4555 - 4559
[38] Fewer interpretable bands via self-supervision for hyperspectral anomaly detection
Wang, Ruike
Hu, Jing
NEUROCOMPUTING, 2025, 616
[39] An Improved Audio Classification Method Based on Parameter-Free Attention Combined with Self-Supervision
Gong X.
Li Z.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (03): : 434 - 440
[40] Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation
Chu, Mengyu
Xie, You
Mayer, Jonas
Leal-Taix, Laura
Thuerey, Nils
ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (04):

← 1 2 3 4 5 →