Mandarin-Tibetan Cross-Lingual Voice Conversion System Based on Deep Neural Network

被引:1
|
作者
Gan, Zhenye [1 ,2 ]
Xing, Xiaotian [1 ]
Yang, Hongwu [1 ,2 ]
Zhao, Guangying [1 ]
机构
[1] Northwest Normal Univ, Coll Phys & Elect Engn, Lanzhou 730000, Gansu, Peoples R China
[2] Engn Res Ctr Gansu Prov Intelligent Informat Tech, Lanzhou 730000, Gansu, Peoples R China
来源
PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018) | 2018年
基金
中国国家自然科学基金;
关键词
Cross-lingual voice conversion; speech recognition; speech synthesis; DNN;
D O I
10.1145/3297156.3297221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper realizes a Mandarin-Tibetan cross-lingual voice conversion system to solve the communication problem between the Mandarin speaker and the Tibetan speaker. Mandarin speech recognition and Tibetan speech synthesis techniques based on deep neural network(DNN) are adopted to convert Mandarin to Tibetan. In this way, we can effectively avoid the problem of building large parallel corpus and complex conversion rules. Meanwhile, we modify the converted Tibetan speech features so that it is perceived as a sentence uttered by the Mandarin speaker. The experimental results show that Mean Opinion Score (MOS) is 3.26 points and the degradation mean opinion score (DMOS) of the timbre similarity between the converted Tibetan speech and the Mandarin speech is 3.07 points.
引用
收藏
页码:67 / 71
页数:5
相关论文
共 50 条
  • [31] Cross-lingual Text Classification with Heterogeneous Graph Neural Network
    Wang, Ziyun
    Liu, Xuan
    Yang, Peiji
    Liu, Shixing
    Wang, Zhisheng
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 612 - 620
  • [32] Cross-Lingual Voice Conversion With Controllable Speaker Individuality Using Variational Autoencoder and Star Generative Adversarial Network
    Ho, Tuan Vu
    Akagi, Masato
    IEEE ACCESS, 2021, 9 : 47503 - 47515
  • [33] Cross-lingual voice conversion based on F0 multi-scale modeling with VITS
    Cao, Danyang
    Zhang, Zeyi
    PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CYBER SECURITY, ARTIFICIAL INTELLIGENCE AND DIGITAL ECONOMY, CSAIDE 2024, 2024, : 375 - 379
  • [34] CROSS-LINGUAL DEEP NEURAL NETWORK BASED SUBMODULAR UNBIASED DATA SELECTION FOR LOW-RESOURCE KEYWORD SEARCH
    Ni, Chongjia
    Leung, Cheung-Chi
    Wang, Lei
    Liu, Haibo
    Rao, Feng
    Lu, Li
    Chen, Nancy F.
    Ma, Bin
    Li, Haizhou
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6015 - 6019
  • [35] A FRAME MAPPING BASED HMM APPROACH TO CROSS-LINGUAL VOICE TRANSFORMATION
    Qian, Yao
    Xu, Ji
    Soong, Frank K.
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5120 - 5123
  • [36] Enhancing Cross-lingual Biomedical Concept Normalization Using Deep Neural Network Pretrained Language Models
    Lin Y.-C.
    Hoffmann P.
    Rahm E.
    SN Computer Science, 3 (5)
  • [37] A cross-lingual medical knowledge graph entity alignment algorithm based on neural tensor network
    Liu, Jianyi
    Chai, Biao
    Shang, Zhijie
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2021, 128 : 31 - 32
  • [38] Investigation of using disentangled and interpretable representations for one-shot cross-lingual voice conversion
    Mohammadi, Seyed Hamidreza
    Kim, Taehwan
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2833 - 2837
  • [39] Cross-Lingual Voice Conversion using a Cyclic Variational Auto-encoder and a WaveNet Vocoder
    Nakatani, Hikaru
    Tobing, Patrick Lumban
    Takeda, Kazuya
    Toda, Tomoki
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 520 - 526
  • [40] Continuous vocoder applied in deep neural network based voice conversion
    Mohammed Salah Al-Radhi
    Tamás Gábor Csapó
    Géza Németh
    Multimedia Tools and Applications, 2019, 78 : 33549 - 33572