Mandarin-Tibetan Cross-Lingual Voice Conversion System Based on Deep Neural Network

被引:1
|
作者
Gan, Zhenye [1 ,2 ]
Xing, Xiaotian [1 ]
Yang, Hongwu [1 ,2 ]
Zhao, Guangying [1 ]
机构
[1] Northwest Normal Univ, Coll Phys & Elect Engn, Lanzhou 730000, Gansu, Peoples R China
[2] Engn Res Ctr Gansu Prov Intelligent Informat Tech, Lanzhou 730000, Gansu, Peoples R China
来源
PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018) | 2018年
基金
中国国家自然科学基金;
关键词
Cross-lingual voice conversion; speech recognition; speech synthesis; DNN;
D O I
10.1145/3297156.3297221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper realizes a Mandarin-Tibetan cross-lingual voice conversion system to solve the communication problem between the Mandarin speaker and the Tibetan speaker. Mandarin speech recognition and Tibetan speech synthesis techniques based on deep neural network(DNN) are adopted to convert Mandarin to Tibetan. In this way, we can effectively avoid the problem of building large parallel corpus and complex conversion rules. Meanwhile, we modify the converted Tibetan speech features so that it is perceived as a sentence uttered by the Mandarin speaker. The experimental results show that Mean Opinion Score (MOS) is 3.26 points and the degradation mean opinion score (DMOS) of the timbre similarity between the converted Tibetan speech and the Mandarin speech is 3.07 points.
引用
收藏
页码:67 / 71
页数:5
相关论文
共 50 条
  • [1] Deep Learning for Mandarin-Tibetan Cross-Lingual Speech Synthesis
    Zhang, Weizhao
    Yang, Hongwu
    Bu, Xiaolong
    Wang, Lili
    IEEE ACCESS, 2019, 7 : 167884 - 167894
  • [2] A DNN-based Mandarin-Tibetan cross-lingual speech synthesis
    Guo, Weitong
    Yang, Hongwu
    Gan, Zhenye
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1702 - 1707
  • [3] Meta-Learning for Mandarin-Tibetan Cross-Lingual Speech Synthesis
    Zhang, Weizhao
    Yang, Hongwu
    APPLIED SCIENCES-BASEL, 2022, 12 (23):
  • [4] Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis
    Hongwu Yang
    Keiichiro Oura
    Haiyan Wang
    Zhenye Gan
    Keiichi Tokuda
    Multimedia Tools and Applications, 2015, 74 : 9927 - 9942
  • [5] Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis
    Yang, Hongwu
    Oura, Keiichiro
    Wang, Haiyan
    Gan, Zhenye
    Tokuda, Keiichi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (22) : 9927 - 9942
  • [6] A DNN-based framework for converting sign language to Mandarin-Tibetan cross-lingual emotional speech
    Song, Nan
    Yang, Hongwu
    Zhang, Tingting
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 209 - 214
  • [7] An Approach to Cross-Lingual Voice Conversion
    Rallabandi, Sai Sirisha
    Gangashetty, Suryakanth V.
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [8] Mandarin-Tibetan Bilingual Cross-language Voice Conversion Based on Semi-hidden Markov Model
    Gan, Zhenye
    Jiang, Jiaolong
    Zhao, Guangying
    Yan, Yajing
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1075 - 1078
  • [9] A MODULARIZED NEURAL NETWORK WITH LANGUAGE-SPECIFIC OUTPUT LAYERS FOR CROSS-LINGUAL VOICE CONVERSION
    Zhou, Yi
    Tian, Xiaohai
    Yilmaz, Emre
    Das, Rohan Kumar
    Li, Haizhou
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 160 - 167
  • [10] Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN
    Du, Zongyang
    Zhou, Kun
    Sisman, Barrak
    Li, Haizhou
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 507 - 513