Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario

被引:5
|
作者
Bartusiak, Emily R. [1 ]
Delp, Edward J. [1 ]
机构
[1] Purdue Univ, Sch Elect & Comp Engn, Video & Image Proc Lab, W Lafayette, IN 47907 USA
来源
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA | 2022年
关键词
machine learning; deep learning; audio forensics; media forensics; speech synthesizer attribution; open set; spectrogram; transformer; convolutional transformer; tSNE;
D O I
10.1109/ICMLA55696.2022.00054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech synthesis methods can create realisticsounding speech, which may be used for fraud, spoofing, and misinformation campaigns. Forensic methods that detect synthesized speech are important for protection against such attacks. Forensic attribution methods provide even more information about the nature of synthesized speech signals because they identify the specific speech synthesis method (i.e., speech synthesizer) used to create a speech signal. Due to the increasing number of realisticsounding speech synthesizers, we propose a speech attribution method that generalizes to new synthesizers not seen during training. To do so, we investigate speech synthesizer attribution in both a closed set scenario and an open set scenario. In other words, we consider some speech synthesizers to be "known" synthesizers (i.e., part of the closed set) and others to be "unknown" synthesizers (i.e., part of the open set). We represent speech signals as spectrograms and train our proposed method, known as compact attribution transformer (CAT), on the closed set for multi-class classification. Then, we extend our analysis to the open set to attribute synthesized speech signals to both known and unknown synthesizers. We utilize a t-distributed stochastic neighbor embedding (tSNE) on the latent space of the trained CAT to differentiate between each unknown synthesizer. Additionally, we explore poly-1 loss formulations to improve attribution results. Our proposed approach successfully attributes synthesized speech signals to their respective speech synthesizers in both closed and open set scenarios.
引用
收藏
页码:329 / 336
页数:8
相关论文
共 50 条
  • [1] TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH LOCAL DENSE SYNTHESIZER ATTENTION
    Xu, Menglong
    Li, Shengqiang
    Zhang, Xiao-Lei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5899 - 5903
  • [2] A transformer-based deep learning approach for recognition of forgery methods in spoofing speech attribution
    Zhang, Qiang
    Zhang, Xiongwei
    Sun, Meng
    Yang, Jibin
    APPLIED SOFT COMPUTING, 2025, 171
  • [3] A transformer-based network for speech recognition
    Tang L.
    International Journal of Speech Technology, 2023, 26 (02) : 531 - 539
  • [4] Musical Speech: A Transformer-based Composition Tool
    d'Eon, Jason
    Dumpala, Harsha
    Sastry, Chandramouli Shama
    Oore, Dani
    Oore, Sageev
    NEURIPS 2020 COMPETITION AND DEMONSTRATION TRACK, VOL 133, 2020, 133 : 253 - 274
  • [5] Transformer-Based Turkish Automatic Speech Recognition
    Tasar, Davut Emre
    Koruyan, Kutan
    Cilgin, Cihan
    ACTA INFOLOGICA, 2024, 8 (01): : 1 - 10
  • [6] TRANSFORMER-BASED DIRECT SPEECH-TO-SPEECH TRANSLATION WITH TRANSCODER
    Kano, Takatomo
    Sakti, Sakriani
    Nakamura, Satoshi
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 958 - 965
  • [7] A Transformer-Based Approach to Authorship Attribution in Classical Arabic Texts
    AlZahrani, Fetoun Mansour
    Al-Yahya, Maha
    APPLIED SCIENCES-BASEL, 2023, 13 (12):
  • [8] RM-Transformer: A Transformer-based Model for Mandarin Speech Recognition
    Lu, Xingyu
    Hu, Jianguo
    Li, Shenhao
    Ding, Yanyu
    2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 194 - 198
  • [9] TRANSFORMER-BASED ACOUSTIC MODELING FOR HYBRID SPEECH RECOGNITION
    Wang, Yongqiang
    Mohamed, Abdelrahman
    Le, Duc
    Liu, Chunxi
    Xiao, Alex
    Mahadeokar, Jay
    Huang, Hongzhao
    Tjandra, Andros
    Zhang, Xiaohui
    Zhang, Frank
    Fuegen, Christian
    Zweig, Geoffrey
    Seltzer, Michael L.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6874 - 6878
  • [10] Transformer-based Acoustic Modeling for Streaming Speech Synthesis
    Wu, Chunyang
    Xiu, Zhiping
    Shi, Yangyang
    Kalinli, Ozlem
    Fuegen, Christian
    Koehler, Thilo
    He, Qing
    INTERSPEECH 2021, 2021, : 146 - 150