Mandarin-Tibetan Cross-Lingual Voice Conversion System Based on Deep Neural Network

被引：1

作者：

Gan, Zhenye ^{[1
,2
]}

Xing, Xiaotian ^{[1
]}

Yang, Hongwu ^{[1
,2
]}

Zhao, Guangying ^{[1
]}

机构：

[1] Northwest Normal Univ, Coll Phys & Elect Engn, Lanzhou 730000, Gansu, Peoples R China

[2] Engn Res Ctr Gansu Prov Intelligent Informat Tech, Lanzhou 730000, Gansu, Peoples R China

来源：

PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018) | 2018年

基金：

中国国家自然科学基金;

关键词：

Cross-lingual voice conversion; speech recognition; speech synthesis; DNN;

D O I：

10.1145/3297156.3297221

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper realizes a Mandarin-Tibetan cross-lingual voice conversion system to solve the communication problem between the Mandarin speaker and the Tibetan speaker. Mandarin speech recognition and Tibetan speech synthesis techniques based on deep neural network(DNN) are adopted to convert Mandarin to Tibetan. In this way, we can effectively avoid the problem of building large parallel corpus and complex conversion rules. Meanwhile, we modify the converted Tibetan speech features so that it is perceived as a sentence uttered by the Mandarin speaker. The experimental results show that Mean Opinion Score (MOS) is 3.26 points and the degradation mean opinion score (DMOS) of the timbre similarity between the converted Tibetan speech and the Mandarin speech is 3.07 points.

引用

页码：67 / 71

页数：5

共 50 条

[1] Deep Learning for Mandarin-Tibetan Cross-Lingual Speech Synthesis
Zhang, Weizhao
Yang, Hongwu
Bu, Xiaolong
Wang, Lili
IEEE ACCESS, 2019, 7 : 167884 - 167894
[2] A DNN-based Mandarin-Tibetan cross-lingual speech synthesis
Guo, Weitong
Yang, Hongwu
Gan, Zhenye
2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1702 - 1707
[3] Meta-Learning for Mandarin-Tibetan Cross-Lingual Speech Synthesis
Zhang, Weizhao
Yang, Hongwu
APPLIED SCIENCES-BASEL, 2022, 12 (23):
[4] Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis
Hongwu Yang
Keiichiro Oura
Haiyan Wang
Zhenye Gan
Keiichi Tokuda
Multimedia Tools and Applications, 2015, 74 : 9927 - 9942
[5] Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis
Yang, Hongwu
Oura, Keiichiro
Wang, Haiyan
Gan, Zhenye
Tokuda, Keiichi
MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (22) : 9927 - 9942
[6] A DNN-based framework for converting sign language to Mandarin-Tibetan cross-lingual emotional speech
Song, Nan
Yang, Hongwu
Zhang, Tingting
2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 209 - 214
[7] An Approach to Cross-Lingual Voice Conversion
Rallabandi, Sai Sirisha
Gangashetty, Suryakanth V.
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[8] Mandarin-Tibetan Bilingual Cross-language Voice Conversion Based on Semi-hidden Markov Model
Gan, Zhenye
Jiang, Jiaolong
Zhao, Guangying
Yan, Yajing
PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1075 - 1078
[9] A MODULARIZED NEURAL NETWORK WITH LANGUAGE-SPECIFIC OUTPUT LAYERS FOR CROSS-LINGUAL VOICE CONVERSION
Zhou, Yi
Tian, Xiaohai
Yilmaz, Emre
Das, Rohan Kumar
Li, Haizhou
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 160 - 167
[10] Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN
Du, Zongyang
Zhou, Kun
Sisman, Barrak
Li, Haizhou
2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 507 - 513

← 1 2 3 4 5 →