Direct Text to Speech Translation System Using Acoustic Units

被引：1

作者：

Mingote, Victoria ^{[1
]}

Gimeno, Pablo ^{[1
]}

Vicente, Luis ^{[1
]}

Khurana, Sameer ^{[2
]}

Laurent, Antoine ^{[3
]}

Duret, Jarod ^{[4
]}

机构：

[1] Univ Zaragoza, ViVoLab Aragon Inst Engn Res I3A, Zaragoza 50009, Spain

[2] Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA

[3] Le Mans Univ, LIUM, F-72085 Le Mans, France

[4] Avignon Univ, LIA, F-84029 Avignon, France

来源：

IEEE SIGNAL PROCESSING LETTERS | 2023年 / 30卷

基金：

欧盟地平线“2020”;

关键词：

Acoustics; Task analysis; Vocoders; Training; Machine translation; Computer architecture; Spectrogram; Acoustic units; CVSS corpus; direct text to speech translation; mBART;

D O I：

10.1109/LSP.2023.3313513

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This letter proposes a direct text to speech translation system using discrete acoustic units. This framework employs text in different source languages as input to generate speech in the target language without the need for text transcriptions in this language. Motivated by the success of acoustic units in previous works for direct speech to speech translation systems, we use the same pipeline to extract the acoustic units using a speech encoder combined with a clustering algorithm. Once units are obtained, an encoder-decoder architecture is trained to predict them. Then a vocoder generates speech from units. Our approach for direct text to speech translation was tested on the new CVSS corpus with two different text mBART models employed as initialisation. The systems presented report competitive performance for most of the language pairs evaluated. Besides, results show a remarkable improvement when initialising our proposed architecture with a model pre-trained with more languages.

引用

页码：1262 / 1266

页数：5

共 50 条

[1] Direct Speech-to-Speech Translation With Discrete Units
Lee, Ann
Chen, Peng-Jen
Wang, Changhan
Gu, Jiatao
Popuri, Sravya
Ma, Xutai
Polyak, Adam
Adi, Yossi
He, Qing
Tang, Yun
Pino, Juan
Hsu, Wei-Ning
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3327 - 3339
[2] Recent Advances in Direct Speech-to-text Translation
Xu, Chen
Ye, Rong
Dong, Qianqian
Zhao, Chengqi
Ko, Tom
Wang, Mingxuan
Xiao, Tong
Zhu, Jingbo
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6796 - 6804
[3] TEXT-TO-SPEECH TRANSLATION SYSTEM FOR ITALIAN
LESMO, L
MEZZALAMA, M
TORASSO, P
INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1978, 10 (05): : 569 - 591
[4] Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation
Zheng, Renjie
Chen, Junkun
Ma, Mingbo
Huang, Liang
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[5] Building a new Czech text-to-speech system using triphone-based speech units
Matousek, J
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 223 - 228
[6] Text to Speech Synthesis System for English to Malayalam Translation
Anto, Ancy
Nisha, K. K.
IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGICAL TRENDS IN COMPUTING, COMMUNICATIONS AND ELECTRICAL ENGINEERING (ICETT), 2016,
[7] UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
Inaguma, Hirofumi
Popuri, Sravya
Kulikov, Ilia
Chen, Peng-Jen
Wang, Changhan
Chung, Yu-An
Tang, Yun
Lee, Ann
Watanabe, Shinji
Pino, Juan
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15655 - 15680
[8] Unsupervised features from text for speech synthesis in a speech-to-speech translation system
Watts, Oliver
Zhou, Bowen
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2164 - 2167
[9] The Xiaomi Text-to-Text Simultaneous Speech Translation System for IWSLT 2022
Guo, Bao
Liu, Mengge
Zhang, Wen
Chen, Hexuan
Mu, Chang
Li, Xiang
Cui, Jianwei
Wang, Bin
Guo, Yuhang
PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2022), 2022, : 216 - 224
[10] Using Syllables as Acoustic Units for Spontaneous Speech Recognition
Hejtmanek, Jan
TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 299 - 305

← 1 2 3 4 5 →