Fine-Grained Multimodal DeepFake Classification via Heterogeneous Graphs

被引:0
|
作者
Yin, Qilin [1 ,2 ,3 ]
Lu, Wei [1 ,2 ,3 ]
Cao, Xiaochun [4 ]
Luo, Xiangyang [5 ]
Zhou, Yicong [6 ]
Huang, Jiwu [7 ]
机构
[1] Sun Yat sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China
[2] Sun Yat sen Univ, Key Lab Informat Technol, Minist Educ, Guangzhou 510006, Peoples R China
[3] Sun Yat Sen Univ, Guangdong Prov Key Lab Informat Secur Technol, Guangzhou 510006, Peoples R China
[4] Sun Yat Sen Univ, Sch Cyber Sci & Technol, Shenzhen 518107, Peoples R China
[5] State Key Lab Math Engn & Adv Comp, Zhengzhou 450001, Henan, Peoples R China
[6] Univ Macau, Dept Comp & Informat Sci, Macau, Peoples R China
[7] Shenzhen MSU BIT Univ, Fac Engn, Guangdong Lab Machine Percept & Intelligent Comp, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal deepfake classification; Audio-visual model; Graph neural networks; Heterogeneous graphs;
D O I
10.1007/s11263-024-02128-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, the abuse of deepfakes is a well-known issue since deepfakes can lead to severe security and privacy problems. And this situation is getting worse, as attackers are no longer limited to unimodal deepfakes, but use multimodal deepfakes, i.e., both audio forgery and video forgery, to better achieve malicious purposes. The existing unimodal or ensemble deepfake detectors are demanded with fine-grained classification capabilities for the growing technique on multimodal deepfakes. To address this gap, we propose a graph attention network based on heterogeneous graph for fine-grained multimodal deepfake classification, i.e., not only distinguishing the authenticity of samples, but also identifying the forged types, e.g., video or audio or both. To this end, we propose a positional coding-based heterogeneous graph construction method that converts an audio-visual sample into a multimodal heterogeneous graph according to relevant hyperparameters. Moreover, a cross-modal graph interaction module is devised to utilize audio-visual synchronization patterns for capturing inter-modal complementary information. The de-homogenization graph pooling operation is elaborately designed to keep differences in graph node features for enhancing the representation of graph-level features. Through the heterogeneous graph attention network, we can efficiently model intra- and inter-modal relationships of multimodal data both at spatial and temporal scales. Extensive experimental results on two audio-visual datasets FakeAVCeleb and LAV-DF demonstrate that our proposed model obtains significant performance gains as compared to other state-of-the-art competitors. The code is available at https://github.com/yinql1995/Fine-grained-Multimodal-DeepFake-Classification/.
引用
收藏
页码:5255 / 5269
页数:15
相关论文
共 50 条
  • [1] Deepfake Detection via Fine-Grained Classification and Global-Local Information Fusion
    Li, Tonghui
    Guo, Yuanfang
    Wang, Yunhong
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 309 - 321
  • [2] Fine-grained multimodal named entity recognition with heterogeneous image-text similarity graphs
    Wang, Yongpeng
    Jiang, Chunmao
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, : 2401 - 2415
  • [3] Improving Fine-Grained Image Classification With Multimodal Information
    Xu, Jie
    Zhang, Xiaoqian
    Zhao, Changming
    Geng, Zili
    Feng, Yuren
    Miao, Ke
    Li, Yunji
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2082 - 2095
  • [4] Fine-Grained Classification via Categorical Memory Networks
    Deng, Weijian
    Marsh, Joshua
    Gould, Stephen
    Zheng, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4186 - 4196
  • [5] Fine-Grained Open-Set Deepfake Detection via Unsupervised Domain Adaptation
    Zhou, Xinye
    Han, Hu
    Shan, Shiguang
    Chen, Xilin
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 7536 - 7547
  • [6] Leveraging Fine-Grained Labels to Regularize Fine-Grained Visual Classification
    Wu, Junfeng
    Yao, Li
    Liu, Bin
    Ding, Zheyuan
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON COMPUTER MODELING AND SIMULATION (ICCMS 2019) AND 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND APPLICATIONS (ICICA 2019), 2019, : 133 - 136
  • [7] Fine-Grained Complexity for Sparse Graphs
    Agarwal, Udit
    Ramachandran, Vijaya
    STOC'18: PROCEEDINGS OF THE 50TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2018, : 239 - 252
  • [8] Fine-grained Image Classification via Combining Vision and Language
    He, Xiangteng
    Peng, Yuxin
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7332 - 7340
  • [9] Fine-grained Image Classification via Spatial Saliency Extraction
    Zhang, Juntan
    Sun, Feng-Wen
    Song, Jie
    Von Ancken, Adam
    Zhai, Richard
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 249 - 255
  • [10] Diagnosing Necrotizing Enterocolitis via Fine-Grained Visual Classification
    Yung, Ka-Wai
    Sivaraj, Jayaram
    De Coppi, Paolo
    Stoyanov, Danail
    Loukogeorgakis, Stavros
    Mazomenos, Evangelos B.
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2024, 71 (11) : 3160 - 3169