Direct Text to Speech Translation System Using Acoustic Units

被引：1

作者：

Mingote, Victoria ^{[1
]}

Gimeno, Pablo ^{[1
]}

Vicente, Luis ^{[1
]}

Khurana, Sameer ^{[2
]}

Laurent, Antoine ^{[3
]}

Duret, Jarod ^{[4
]}

机构：

[1] Univ Zaragoza, ViVoLab Aragon Inst Engn Res I3A, Zaragoza 50009, Spain

[2] Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA

[3] Le Mans Univ, LIUM, F-72085 Le Mans, France

[4] Avignon Univ, LIA, F-84029 Avignon, France

来源：

IEEE SIGNAL PROCESSING LETTERS | 2023年 / 30卷

基金：

欧盟地平线“2020”;

关键词：

Acoustics; Task analysis; Vocoders; Training; Machine translation; Computer architecture; Spectrogram; Acoustic units; CVSS corpus; direct text to speech translation; mBART;

D O I：

10.1109/LSP.2023.3313513

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This letter proposes a direct text to speech translation system using discrete acoustic units. This framework employs text in different source languages as input to generate speech in the target language without the need for text transcriptions in this language. Motivated by the success of acoustic units in previous works for direct speech to speech translation systems, we use the same pipeline to extract the acoustic units using a speech encoder combined with a clustering algorithm. Once units are obtained, an encoder-decoder architecture is trained to predict them. Then a vocoder generates speech from units. Our approach for direct text to speech translation was tested on the new CVSS corpus with two different text mBART models employed as initialisation. The systems presented report competitive performance for most of the language pairs evaluated. Besides, results show a remarkable improvement when initialising our proposed architecture with a model pre-trained with more languages.

引用

页码：1262 / 1266

页数：5

共 50 条

[31] Speech translation system
不详
CHINESE JOURNAL OF ELECTRONICS, 2001, 10 (04): : 443 - 443
[32] Designing, Implementing and Testing the Acoustic Component of a Text to Speech System for the Romanian Language
Boldizsar, Razvan Alin
Ordean, Mihaela
Giurgea, Corina
INFORMATICS IN ECONOMY, 2018, 273 : 101 - 114
[33] Direct Speech-to-Image Translation
Li, Jiguo
Zhang, Xinfeng
Jia, Chuanmin
Xu, Jizheng
Zhang, Li
Wang, Yue
Ma, Siwei
Gao, Wen
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (03) : 517 - 529
[34] Direct Speech Translation for Automatic Subtitling
Papi, Sara
Gaido, Marco
Karakanta, Alina
Cettolo, Mauro
Negri, Matteo
Turchi, Marco
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1355 - 1376
[35] On the Locality of Attention in Direct Speech Translation
Alastruey, Belen
Ferrando, Javier
Gallego, Gerard, I
Costa-jussa, Marta R.
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 402 - 412
[36] Lost in Translation: Machine Translation and Text-To-Speech in Industry 4.0
Haslwanter, Jean D. Hallewell
Heiml, Michael
Wolfartsberger, Josef
12TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS (PETRA 2019), 2019, : 333 - 342
[37] Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding
Liu, Yuchen
Zhang, Jiajun
Xiong, Hao
Zhou, Long
He, Zhongjun
Wu, Hua
Wang, Haifeng
Zong, Chengqing
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8417 - 8424
[38] Indonesian Text-To-Speech System Using Syllable Concatenation: Speech Optimization
Mengko, Richard
Ayuningtyas, Aulia
PROCEEDINGS OF 2013 3RD INTERNATIONAL CONFERENCE ON INSTRUMENTATION, COMMUNICATIONS, INFORMATION TECHNOLOGY, AND BIOMEDICAL ENGINEERING (ICICI-BME), 2013, : 412 - 415
[39] AN ANALYSIS OF MACHINE TRANSLATION AND SPEECH SYNTHESIS IN SPEECH-TO-SPEECH TRANSLATION SYSTEM
Hashimoto, Kei
Yamagishi, Junichi
Byrne, William
King, Simon
Tokuda, Keiichi
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5108 - 5111
[40] OCR BASED SPEECH SYNTHESIS SYSTEM USING LAB VIEW Text to Speech Conversion System using OCR
Mullani, J. J.
Sankar, M.
Khade, Priyanka S.
Sonalkar, Snehal H.
Patil, Nikita L.
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2018), 2018, : 7 - 14

← 1 2 3 4 5 →