Direct Text to Speech Translation System Using Acoustic Units

被引:1
|
作者
Mingote, Victoria [1 ]
Gimeno, Pablo [1 ]
Vicente, Luis [1 ]
Khurana, Sameer [2 ]
Laurent, Antoine [3 ]
Duret, Jarod [4 ]
机构
[1] Univ Zaragoza, ViVoLab Aragon Inst Engn Res I3A, Zaragoza 50009, Spain
[2] Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[3] Le Mans Univ, LIUM, F-72085 Le Mans, France
[4] Avignon Univ, LIA, F-84029 Avignon, France
基金
欧盟地平线“2020”;
关键词
Acoustics; Task analysis; Vocoders; Training; Machine translation; Computer architecture; Spectrogram; Acoustic units; CVSS corpus; direct text to speech translation; mBART;
D O I
10.1109/LSP.2023.3313513
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This letter proposes a direct text to speech translation system using discrete acoustic units. This framework employs text in different source languages as input to generate speech in the target language without the need for text transcriptions in this language. Motivated by the success of acoustic units in previous works for direct speech to speech translation systems, we use the same pipeline to extract the acoustic units using a speech encoder combined with a clustering algorithm. Once units are obtained, an encoder-decoder architecture is trained to predict them. Then a vocoder generates speech from units. Our approach for direct text to speech translation was tested on the new CVSS corpus with two different text mBART models employed as initialisation. The systems presented report competitive performance for most of the language pairs evaluated. Besides, results show a remarkable improvement when initialising our proposed architecture with a model pre-trained with more languages.
引用
收藏
页码:1262 / 1266
页数:5
相关论文
共 50 条
  • [31] Speech translation system
    不详
    CHINESE JOURNAL OF ELECTRONICS, 2001, 10 (04): : 443 - 443
  • [32] Designing, Implementing and Testing the Acoustic Component of a Text to Speech System for the Romanian Language
    Boldizsar, Razvan Alin
    Ordean, Mihaela
    Giurgea, Corina
    INFORMATICS IN ECONOMY, 2018, 273 : 101 - 114
  • [33] Direct Speech-to-Image Translation
    Li, Jiguo
    Zhang, Xinfeng
    Jia, Chuanmin
    Xu, Jizheng
    Zhang, Li
    Wang, Yue
    Ma, Siwei
    Gao, Wen
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (03) : 517 - 529
  • [34] Direct Speech Translation for Automatic Subtitling
    Papi, Sara
    Gaido, Marco
    Karakanta, Alina
    Cettolo, Mauro
    Negri, Matteo
    Turchi, Marco
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1355 - 1376
  • [35] On the Locality of Attention in Direct Speech Translation
    Alastruey, Belen
    Ferrando, Javier
    Gallego, Gerard, I
    Costa-jussa, Marta R.
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 402 - 412
  • [36] Lost in Translation: Machine Translation and Text-To-Speech in Industry 4.0
    Haslwanter, Jean D. Hallewell
    Heiml, Michael
    Wolfartsberger, Josef
    12TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS (PETRA 2019), 2019, : 333 - 342
  • [37] Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding
    Liu, Yuchen
    Zhang, Jiajun
    Xiong, Hao
    Zhou, Long
    He, Zhongjun
    Wu, Hua
    Wang, Haifeng
    Zong, Chengqing
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8417 - 8424
  • [38] Indonesian Text-To-Speech System Using Syllable Concatenation: Speech Optimization
    Mengko, Richard
    Ayuningtyas, Aulia
    PROCEEDINGS OF 2013 3RD INTERNATIONAL CONFERENCE ON INSTRUMENTATION, COMMUNICATIONS, INFORMATION TECHNOLOGY, AND BIOMEDICAL ENGINEERING (ICICI-BME), 2013, : 412 - 415
  • [39] AN ANALYSIS OF MACHINE TRANSLATION AND SPEECH SYNTHESIS IN SPEECH-TO-SPEECH TRANSLATION SYSTEM
    Hashimoto, Kei
    Yamagishi, Junichi
    Byrne, William
    King, Simon
    Tokuda, Keiichi
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5108 - 5111
  • [40] OCR BASED SPEECH SYNTHESIS SYSTEM USING LAB VIEW Text to Speech Conversion System using OCR
    Mullani, J. J.
    Sankar, M.
    Khade, Priyanka S.
    Sonalkar, Snehal H.
    Patil, Nikita L.
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2018), 2018, : 7 - 14