AVForensics: Audio-driven Deepfake Video Detection with Masking Strategy in Self-supervision

被引:3
|
作者
Zhu Yizhe [1 ]
Gao Jialin [2 ]
Zhou Xi [3 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Natl Univ Singapore, Singapore, Singapore
[3] CloudWalk Technol, Shanghai, Peoples R China
关键词
Deepfake detection; audio-visual; masking strategy; self-supervision;
D O I
10.1145/3591106.3592218
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing cross-dataset deepfake detection approaches exploit mouth-related mismatches between the auditory and visual modalities in fake videos to enhance generalisation to unseen forgeries. However, such methods inevitably suffer performance degradation with limited or unaltered mouth motions, we argue that face forgery detection consistently benefits from using high-level cues across the whole face region. In this paper, we propose a two-phase audio-driven multi-modal transformer-based framework, termed AVForensics, to perform deepfake video content detection from an audio-visual matching view related to full face. In the first pretraining phase, we apply the novel uniform masking strategy to model global facial features and learn temporally dense video representations in a self-supervised cross-modal manner, by capturing the natural correspondence between the visual and auditory modalities regardless of large-scaled labelled data and heavy memory usage. Then we use these learned representations to fine-tune for the down-stream deepfake detection task in the second phase, which encourages the model to offer accurate predictions based on captured global facial movement features. Extensive experiments and visualizations on various public datasets demonstrate the superiority of our self-supervised pre-trained method for achieving generalisable and robust deepfake video detection.
引用
收藏
页码:162 / 171
页数:10
相关论文
共 50 条
  • [31] Learning Spatiotemporal 3D Convolution with Video Order Self-supervision
    Suzuki, Tomoyuki
    Itazuri, Takahiro
    Hara, Kensho
    Kataoka, Hirokatsu
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT II, 2019, 11130 : 590 - 598
  • [32] Cervix type detection using a self-supervision boosted object detection technique
    Bijoy, M. B.
    Akondi, Sai Manoj
    Fathaah, S. Abdul
    Raut, Akash
    Pournami, P. N.
    Jayaraj, P. B.
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2022, 32 (05) : 1615 - 1630
  • [33] TRICYCLE: AUDIO REPRESENTATION LEARNING FROM SENSOR NETWORK DATA USING SELF-SUPERVISION
    Cartwright, Mark
    Cramer, Jason
    Salamon, Justin
    Bello, Juan Pablo
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 278 - 282
  • [34] Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training
    Zhan, Li-Ming
    Liang, Haowen
    Liu, Bo
    Fan, Lu
    Wu, Xiao-Ming
    Lam, Albert Y. S.
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3521 - 3532
  • [35] Enhancing Ethereum Fraud Detection via Generative and Contrastive Self-Supervision
    Jin, Chengxiang
    Zhou, Jiajun
    Xie, Chenxuan
    Yu, Shanqing
    Xuan, Qi
    Yang, Xiaoniu
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2025, 20 : 839 - 853
  • [36] Equivariant Spatio-temporal Self-supervision for LiDAR Object Detection
    Hegde, Deepti
    Lohit, Suhas
    Peng, Kuan-Chuan
    Jones, Michael J.
    Patel, Vishal M.
    COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 475 - 491
  • [37] Self-supervision Meets Adversarial Perturbation: A Novel Framework for Anomaly Detection
    Wang, Yizhou
    Qin, Can
    Wei, Rongzhe
    Xu, Yi
    Bai, Yue
    Fu, Yun
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4555 - 4559
  • [38] Fewer interpretable bands via self-supervision for hyperspectral anomaly detection
    Wang, Ruike
    Hu, Jing
    NEUROCOMPUTING, 2025, 616
  • [39] An Improved Audio Classification Method Based on Parameter-Free Attention Combined with Self-Supervision
    Gong X.
    Li Z.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (03): : 434 - 440
  • [40] Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation
    Chu, Mengyu
    Xie, You
    Mayer, Jonas
    Leal-Taix, Laura
    Thuerey, Nils
    ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (04):