Transformer-Based Speech Synthesizer Attribution in an Open Set Scenario

被引：5

作者：

Bartusiak, Emily R. ^{[1
]}

Delp, Edward J. ^{[1
]}

机构：

[1] Purdue Univ, Sch Elect & Comp Engn, Video & Image Proc Lab, W Lafayette, IN 47907 USA

来源：

2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA | 2022年

关键词：

machine learning; deep learning; audio forensics; media forensics; speech synthesizer attribution; open set; spectrogram; transformer; convolutional transformer; tSNE;

D O I：

10.1109/ICMLA55696.2022.00054

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech synthesis methods can create realisticsounding speech, which may be used for fraud, spoofing, and misinformation campaigns. Forensic methods that detect synthesized speech are important for protection against such attacks. Forensic attribution methods provide even more information about the nature of synthesized speech signals because they identify the specific speech synthesis method (i.e., speech synthesizer) used to create a speech signal. Due to the increasing number of realisticsounding speech synthesizers, we propose a speech attribution method that generalizes to new synthesizers not seen during training. To do so, we investigate speech synthesizer attribution in both a closed set scenario and an open set scenario. In other words, we consider some speech synthesizers to be "known" synthesizers (i.e., part of the closed set) and others to be "unknown" synthesizers (i.e., part of the open set). We represent speech signals as spectrograms and train our proposed method, known as compact attribution transformer (CAT), on the closed set for multi-class classification. Then, we extend our analysis to the open set to attribute synthesized speech signals to both known and unknown synthesizers. We utilize a t-distributed stochastic neighbor embedding (tSNE) on the latent space of the trained CAT to differentiate between each unknown synthesizer. Additionally, we explore poly-1 loss formulations to improve attribution results. Our proposed approach successfully attributes synthesized speech signals to their respective speech synthesizers in both closed and open set scenarios.

引用

页码：329 / 336

页数：8

共 50 条

[21] Simulating reading mistakes for child speech Transformer-based phone recognition
Gelin, Lucile
Pellegrini, Thomas
Pinquier, Julien
Daniel, Morgane
INTERSPEECH 2021, 2021, : 3860 - 3864
[22] A Transformer-Based End-to-End Automatic Speech Recognition Algorithm
Dong, Fang
Qian, Yiyang
Wang, Tianlei
Liu, Peng
Cao, Jiuwen
IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1592 - 1596
[23] Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
Zhao, Chendong
Wang, Jianzong
Wei, Wenqi
Qu, Xiaoyang
Wang, Haoqian
Xiao, Jing
2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 173 - 180
[24] T-DVAE: A Transformer-Based Dynamical Variational Autoencoder for Speech
Perschewski, Jan-Ole
Stober, Sebastian
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VII, 2024, 15022 : 33 - 46
[25] SIMULTANEOUS SPEECH-TO-SPEECH TRANSLATION SYSTEM WITH TRANSFORMER-BASED INCREMENTAL ASR, MT, AND TTS
Fukuda, Ryo
Novitasari, Sashi
Oka, Yui
Kano, Yasumasa
Yano, Yuki
Ko, Yuka
Tokuyama, Hirotaka
Doi, Kosuke
Yanagita, Tomoya
Sakti, Sakriani
Sudoh, Katsuhito
Nakamura, Satoshi
2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 186 - 192
[26] TRANSFORMER IN ACTION: A COMPARATIVE STUDY OF TRANSFORMER-BASED ACOUSTIC MODELS FOR LARGE SCALE SPEECH RECOGNITION APPLICATIONS
Wang, Yongqiang
Shi, Yangyang
Zhang, Frank
Wu, Chunyang
Chan, Julian
Yeh, Ching-Feng
Xiao, Alex
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6778 - 6782
[27] Transfer Learning of Transformer-Based Speech Recognition Models from Czech to Slovak
Lehecka, Jan
Psutka, Josef, V
Psutka, Josef
TEXT, SPEECH, AND DIALOGUE, TSD 2023, 2023, 14102 : 328 - 338
[28] An End-to-End Transformer-Based Automatic Speech Recognition for Qur?an Reciters
Hadwan, Mohammed
Alsayadi, Hamzah A.
AL-Hagree, Salah
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 3471 - 3487
[29] Transformer-Based End-to-End Speech Translation With Rotary Position Embedding
Li, Xueqing
Li, Shengqiang
Zhang, Xiao-Lei
Rahardja, Susanto
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 371 - 375
[30] Transformer-Based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project
Lehecka, Jan
Psutka, Josef, V
Psutka, Josef
TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 301 - 312

← 1 2 3 4 5 →