Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario

被引:5
|
作者
Bartusiak, Emily R. [1 ]
Delp, Edward J. [1 ]
机构
[1] Purdue Univ, Sch Elect & Comp Engn, Video & Image Proc Lab, W Lafayette, IN 47907 USA
来源
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA | 2022年
关键词
machine learning; deep learning; audio forensics; media forensics; speech synthesizer attribution; open set; spectrogram; transformer; convolutional transformer; tSNE;
D O I
10.1109/ICMLA55696.2022.00054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech synthesis methods can create realisticsounding speech, which may be used for fraud, spoofing, and misinformation campaigns. Forensic methods that detect synthesized speech are important for protection against such attacks. Forensic attribution methods provide even more information about the nature of synthesized speech signals because they identify the specific speech synthesis method (i.e., speech synthesizer) used to create a speech signal. Due to the increasing number of realisticsounding speech synthesizers, we propose a speech attribution method that generalizes to new synthesizers not seen during training. To do so, we investigate speech synthesizer attribution in both a closed set scenario and an open set scenario. In other words, we consider some speech synthesizers to be "known" synthesizers (i.e., part of the closed set) and others to be "unknown" synthesizers (i.e., part of the open set). We represent speech signals as spectrograms and train our proposed method, known as compact attribution transformer (CAT), on the closed set for multi-class classification. Then, we extend our analysis to the open set to attribute synthesized speech signals to both known and unknown synthesizers. We utilize a t-distributed stochastic neighbor embedding (tSNE) on the latent space of the trained CAT to differentiate between each unknown synthesizer. Additionally, we explore poly-1 loss formulations to improve attribution results. Our proposed approach successfully attributes synthesized speech signals to their respective speech synthesizers in both closed and open set scenarios.
引用
收藏
页码:329 / 336
页数:8
相关论文
共 50 条
  • [31] Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation
    Sant, Gerard
    Gállego, Gerard I.
    Alastruey, Belen
    Costa-Jussà, Marta R.
    NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Student Research Workshop, 2022, : 277 - 284
  • [32] Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation
    Sant, Gerard
    Gallego, Gerard, I
    Alastruey, Belen
    Costa-Jussa, Marta R.
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 277 - 284
  • [33] Transformer-based Long-context End-to-end Speech Recognition
    Hori, Takaaki
    Moritz, Niko
    Hori, Chiori
    Le Roux, Jonathan
    INTERSPEECH 2020, 2020, : 5011 - 5015
  • [34] Transformer-based neural speech decoding from surface and depth electrode signals
    Chen, Junbo
    Chen, Xupeng
    Wang, Ran
    Le, Chenqian
    Khalilian-Gourtani, Amirhossein
    Jensen, Erika
    Dugan, Patricia
    Doyle, Werner
    Devinsky, Orrin
    Friedman, Daniel
    Flinker, Adeen
    Wang, Yao
    JOURNAL OF NEURAL ENGINEERING, 2025, 22 (01)
  • [35] ScaleFormer: Transformer-based speech enhancement in the multi-scale time domain
    Wu, Tianci
    He, Shulin
    Zhang, Hui
    Zhang, XueLiang
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 2448 - 2453
  • [36] On-device Streaming Transformer-based End-to-End Speech Recognition
    Oh, Yoo Rhee
    Park, Kiyoung
    INTERSPEECH 2021, 2021, : 967 - 968
  • [37] Efficient Transformer-based Speech Enhancement Using Long Frames and STFT Magnitudes
    de Oliveira, Danilo
    Peer, Tal
    Gerkmann, Timo
    INTERSPEECH 2022, 2022, : 2948 - 2952
  • [38] An Investigation of Positional Encoding in Transformer-based End-to-end Speech Recognition
    Yue, Fengpeng
    Ko, Tom
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [39] Transformer-Based Learned Optimization
    Gartner, Erik
    Metz, Luke
    Andriluka, Mykhaylo
    Freeman, C. Daniel
    Sminchisescu, Cristian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11970 - 11979
  • [40] Transformer-based Image Compression
    Lu, Ming
    Guo, Peiyao
    Shi, Huiqing
    Cao, Chuntong
    Ma, Zhan
    DCC 2022: 2022 DATA COMPRESSION CONFERENCE (DCC), 2022, : 469 - 469