Direct Text to Speech Translation System Using Acoustic Units

被引:1
|
作者
Mingote, Victoria [1 ]
Gimeno, Pablo [1 ]
Vicente, Luis [1 ]
Khurana, Sameer [2 ]
Laurent, Antoine [3 ]
Duret, Jarod [4 ]
机构
[1] Univ Zaragoza, ViVoLab Aragon Inst Engn Res I3A, Zaragoza 50009, Spain
[2] Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[3] Le Mans Univ, LIUM, F-72085 Le Mans, France
[4] Avignon Univ, LIA, F-84029 Avignon, France
基金
欧盟地平线“2020”;
关键词
Acoustics; Task analysis; Vocoders; Training; Machine translation; Computer architecture; Spectrogram; Acoustic units; CVSS corpus; direct text to speech translation; mBART;
D O I
10.1109/LSP.2023.3313513
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This letter proposes a direct text to speech translation system using discrete acoustic units. This framework employs text in different source languages as input to generate speech in the target language without the need for text transcriptions in this language. Motivated by the success of acoustic units in previous works for direct speech to speech translation systems, we use the same pipeline to extract the acoustic units using a speech encoder combined with a clustering algorithm. Once units are obtained, an encoder-decoder architecture is trained to predict them. Then a vocoder generates speech from units. Our approach for direct text to speech translation was tested on the new CVSS corpus with two different text mBART models employed as initialisation. The systems presented report competitive performance for most of the language pairs evaluated. Besides, results show a remarkable improvement when initialising our proposed architecture with a model pre-trained with more languages.
引用
收藏
页码:1262 / 1266
页数:5
相关论文
共 50 条
  • [1] Direct Speech-to-Speech Translation With Discrete Units
    Lee, Ann
    Chen, Peng-Jen
    Wang, Changhan
    Gu, Jiatao
    Popuri, Sravya
    Ma, Xutai
    Polyak, Adam
    Adi, Yossi
    He, Qing
    Tang, Yun
    Pino, Juan
    Hsu, Wei-Ning
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3327 - 3339
  • [2] Recent Advances in Direct Speech-to-text Translation
    Xu, Chen
    Ye, Rong
    Dong, Qianqian
    Zhao, Chengqi
    Ko, Tom
    Wang, Mingxuan
    Xiao, Tong
    Zhu, Jingbo
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6796 - 6804
  • [3] TEXT-TO-SPEECH TRANSLATION SYSTEM FOR ITALIAN
    LESMO, L
    MEZZALAMA, M
    TORASSO, P
    INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1978, 10 (05): : 569 - 591
  • [4] Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation
    Zheng, Renjie
    Chen, Junkun
    Ma, Mingbo
    Huang, Liang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] Building a new Czech text-to-speech system using triphone-based speech units
    Matousek, J
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 223 - 228
  • [6] Text to Speech Synthesis System for English to Malayalam Translation
    Anto, Ancy
    Nisha, K. K.
    IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGICAL TRENDS IN COMPUTING, COMMUNICATIONS AND ELECTRICAL ENGINEERING (ICETT), 2016,
  • [7] UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units
    Inaguma, Hirofumi
    Popuri, Sravya
    Kulikov, Ilia
    Chen, Peng-Jen
    Wang, Changhan
    Chung, Yu-An
    Tang, Yun
    Lee, Ann
    Watanabe, Shinji
    Pino, Juan
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15655 - 15680
  • [8] Unsupervised features from text for speech synthesis in a speech-to-speech translation system
    Watts, Oliver
    Zhou, Bowen
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2164 - 2167
  • [9] The Xiaomi Text-to-Text Simultaneous Speech Translation System for IWSLT 2022
    Guo, Bao
    Liu, Mengge
    Zhang, Wen
    Chen, Hexuan
    Mu, Chang
    Li, Xiang
    Cui, Jianwei
    Wang, Bin
    Guo, Yuhang
    PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2022), 2022, : 216 - 224
  • [10] Using Syllables as Acoustic Units for Spontaneous Speech Recognition
    Hejtmanek, Jan
    TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 299 - 305