AntiFake: Using Adversarial Audio to Prevent Unauthorized Speech Synthesis

被引:3
|
作者
Yu, Zhiyuan [1 ]
Zhai, Shixuan [1 ]
Zhang, Ning [1 ]
机构
[1] Washington Univ, St Louis, MO 63110 USA
关键词
Adversarial Machine Learning; Generative AI; Speech Synthesis; DeepFake Defense;
D O I
10.1145/3576915.3623209
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid development of deep neural networks and generative AI has catalyzed growth in realistic speech synthesis. While this technology has great potential to improve lives, it also leads to the emergence of "DeepFake" where synthesized speech can be misused to deceive humans and machines for nefarious purposes. In response to this evolving threat, there has been a significant amount of interest in mitigating this threat by DeepFake detection. Complementary to the existing work, we propose to take the preventative approach and introduce AntiFake, a defense mechanism that relies on adversarial examples to prevent unauthorized speech synthesis. To ensure the transferability to attackers' unknown synthesis models, an ensemble learning approach is adopted to improve the generalizability of the optimization process. To validate the efficacy of the proposed system, we evaluated AntiFake against five state-of-the-art synthesizers using real-world DeepFake speech samples. The experiments indicated that AntiFake achieved over 95% protection rate even to unknown black-box models. We have also conducted usability tests involving 24 human participants to ensure the solution is accessible to diverse populations.
引用
收藏
页码:460 / 474
页数:15
相关论文
共 50 条
  • [31] Speech Emotion Recognition Using Audio Matching
    Chaturvedi, Iti
    Noel, Tim
    Satapathy, Ranjan
    ELECTRONICS, 2022, 11 (23)
  • [32] JOINT AND ADVERSARIAL TRAINING WITH ASR FOR EXPRESSIVE SPEECH SYNTHESIS
    Zhang, Kaili
    Gong, Cheng
    Lu, Wenhuan
    Wang, Longbiao
    Wei, Jianguo
    Liu, Dawei
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6322 - 6326
  • [33] Speech and audio coding using temporal masking
    Gunawan, TS
    Ambikairajah, E
    Senn, D
    SIGNAL PROCESSING FOR TELECOMMUNICATIONS AND MULTIMEDIA, 2005, 27 : 31 - 42
  • [34] Analysis by Adversarial Synthesis - A Novel Approach for Speech Vocoding
    Mustafa, Ahmed
    Biswas, Arijit
    Bergler, Christian
    Schottenhamml, Julia
    Maier, Andreas
    INTERSPEECH 2019, 2019, : 191 - 195
  • [35] Speech audio retrieval using voice query
    Ratanamahatana, Chotirat Ann
    Tohlong, Phubes
    DIGITAL LIBRARIES: ACHIEVEMENTS, CHALLENGES AND OPPORTUNITIES, PROCEEDINGS, 2006, 4312 : 494 - +
  • [36] Facial movement synthesis by HMM from audio speech
    Kakihara, K
    Nakamura, S
    Shikano, K
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2002, 85 (04): : 37 - 46
  • [37] High Quality Audio Adversarial Examples Without Using Psychoacoustics
    Zong, Wei
    Chow, Yang-Wai
    Susilo, Willy
    CYBERSPACE SAFETY AND SECURITY, CSS 2022, 2022, 13547 : 163 - 177
  • [38] Defending and Detecting Audio Adversarial Example using Frame Offsets
    Gong, Yongkang
    Yan, Diqun
    Mao, Terui
    Wang, Donghua
    Wang, Rangding
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (04): : 1538 - 1552
  • [39] Adversarial Generation of Time-Frequency Features with application in audio synthesis
    Marafioti, Andres
    Holighaus, Nicki
    Perraudin, Nathanael
    Majdak, Piotr
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [40] Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
    Kim, Ji-Hoon
    Lee, Sang-Hoon
    Lee, Ji-Hyun
    Lee, Seong-Whan
    INTERSPEECH 2021, 2021, : 2197 - 2201