Mandarin-Tibetan Cross-Lingual Voice Conversion System Based on Deep Neural Network

被引:1
|
作者
Gan, Zhenye [1 ,2 ]
Xing, Xiaotian [1 ]
Yang, Hongwu [1 ,2 ]
Zhao, Guangying [1 ]
机构
[1] Northwest Normal Univ, Coll Phys & Elect Engn, Lanzhou 730000, Gansu, Peoples R China
[2] Engn Res Ctr Gansu Prov Intelligent Informat Tech, Lanzhou 730000, Gansu, Peoples R China
来源
PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018) | 2018年
基金
中国国家自然科学基金;
关键词
Cross-lingual voice conversion; speech recognition; speech synthesis; DNN;
D O I
10.1145/3297156.3297221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper realizes a Mandarin-Tibetan cross-lingual voice conversion system to solve the communication problem between the Mandarin speaker and the Tibetan speaker. Mandarin speech recognition and Tibetan speech synthesis techniques based on deep neural network(DNN) are adopted to convert Mandarin to Tibetan. In this way, we can effectively avoid the problem of building large parallel corpus and complex conversion rules. Meanwhile, we modify the converted Tibetan speech features so that it is perceived as a sentence uttered by the Mandarin speaker. The experimental results show that Mean Opinion Score (MOS) is 3.26 points and the degradation mean opinion score (DMOS) of the timbre similarity between the converted Tibetan speech and the Mandarin speech is 3.07 points.
引用
收藏
页码:67 / 71
页数:5
相关论文
共 50 条
  • [21] Cross-Lingual Voice Conversion-Based Polyglot Speech Synthesizer for Indian Languages
    Ramani, B.
    Jeeva, Actlin M. P.
    Vijayalakshmi, P.
    Nagarajan, T.
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 775 - 779
  • [22] Cross-Lingual Neural Network Speech Synthesis Based on Multiple Embeddings
    Nosek, Tijana, V
    Suzic, Sinisa B.
    Pekar, Darko J.
    Obradovic, Radovan J.
    Secujski, Milan S.
    Delic, Vlado D.
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2021, 7 (02): : 110 - 120
  • [23] A New HMM-Based Voice Conversion Methodology Evaluated on Monolingual and Cross-Lingual Conversion Tasks
    Percybrooks, Winston S.
    Moore, Elliot
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) : 2298 - 2310
  • [24] CROSS-LINGUAL VOICE CONVERSION WITH BILINGUAL PHONETIC POSTERIORGRAM AND AVERAGE MODELING
    Zhou, Yi
    Tian, Xiaohai
    Xu, Haihua
    Das, Rohan Kumar
    Li, Haizhou
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6790 - 6794
  • [25] Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation
    Zhou, Yi
    Tian, Xiaohai
    Wu, Zhizheng
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 1374 - 1378
  • [26] Domain Adaptation and Language Conditioning to Improve Phonetic Posteriorgram Based Cross-Lingual Voice Conversion
    Hsu, Pin-Chieh
    Minematsu, Nobuaki
    Saito, Daisuke
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 950 - 956
  • [27] Multi-Task WaveRNN With an Integrated Architecture for Cross-Lingual Voice Conversion
    Zhou, Yi
    Tian, Xiaohai
    Li, Haizhou
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 1310 - 1314
  • [28] Voice Conversion System Based on Deep Neural Network Capable of Parallel Computation
    Sato, Kunihiko
    Rekimoto, Jun
    25TH 2018 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES (VR), 2018, : 677 - 678
  • [29] Optimization of Cross-Lingual Voice Conversion With Linguistics Losses to Reduce Foreign Accents
    Zhou, Yi
    Wu, Zhizheng
    Tian, Xiaohai
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1916 - 1926
  • [30] ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
    Casanova, Edresson
    Shulby, Christopher
    Korolev, Alexander
    Candido Junior, Arnaldo
    Soares, Anderson da Silva
    Aluisio, Sandra
    Ponti, Moacir Antonelli
    INTERSPEECH 2023, 2023, : 1244 - 1248