Mandarin-Tibetan Cross-Lingual Voice Conversion System Based on Deep Neural Network

被引：1

作者：

Gan, Zhenye ^{[1
,2
]}

Xing, Xiaotian ^{[1
]}

Yang, Hongwu ^{[1
,2
]}

Zhao, Guangying ^{[1
]}

机构：

[1] Northwest Normal Univ, Coll Phys & Elect Engn, Lanzhou 730000, Gansu, Peoples R China

[2] Engn Res Ctr Gansu Prov Intelligent Informat Tech, Lanzhou 730000, Gansu, Peoples R China

来源：

PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018) | 2018年

基金：

中国国家自然科学基金;

关键词：

Cross-lingual voice conversion; speech recognition; speech synthesis; DNN;

D O I：

10.1145/3297156.3297221

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper realizes a Mandarin-Tibetan cross-lingual voice conversion system to solve the communication problem between the Mandarin speaker and the Tibetan speaker. Mandarin speech recognition and Tibetan speech synthesis techniques based on deep neural network(DNN) are adopted to convert Mandarin to Tibetan. In this way, we can effectively avoid the problem of building large parallel corpus and complex conversion rules. Meanwhile, we modify the converted Tibetan speech features so that it is perceived as a sentence uttered by the Mandarin speaker. The experimental results show that Mean Opinion Score (MOS) is 3.26 points and the degradation mean opinion score (DMOS) of the timbre similarity between the converted Tibetan speech and the Mandarin speech is 3.07 points.

引用

页码：67 / 71

页数：5

共 50 条

[31] Cross-lingual Text Classification with Heterogeneous Graph Neural Network
Wang, Ziyun
Liu, Xuan
Yang, Peiji
Liu, Shixing
Wang, Zhisheng
ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 612 - 620
[32] Cross-Lingual Voice Conversion With Controllable Speaker Individuality Using Variational Autoencoder and Star Generative Adversarial Network
Ho, Tuan Vu
Akagi, Masato
IEEE ACCESS, 2021, 9 : 47503 - 47515
[33] Cross-lingual voice conversion based on F0 multi-scale modeling with VITS
Cao, Danyang
Zhang, Zeyi
PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CYBER SECURITY, ARTIFICIAL INTELLIGENCE AND DIGITAL ECONOMY, CSAIDE 2024, 2024, : 375 - 379
[34] CROSS-LINGUAL DEEP NEURAL NETWORK BASED SUBMODULAR UNBIASED DATA SELECTION FOR LOW-RESOURCE KEYWORD SEARCH
Ni, Chongjia
Leung, Cheung-Chi
Wang, Lei
Liu, Haibo
Rao, Feng
Lu, Li
Chen, Nancy F.
Ma, Bin
Li, Haizhou
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6015 - 6019
[35] A FRAME MAPPING BASED HMM APPROACH TO CROSS-LINGUAL VOICE TRANSFORMATION
Qian, Yao
Xu, Ji
Soong, Frank K.
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5120 - 5123
[36] Enhancing Cross-lingual Biomedical Concept Normalization Using Deep Neural Network Pretrained Language Models
Lin Y.-C.
Hoffmann P.
Rahm E.
SN Computer Science, 3 (5)
[37] A cross-lingual medical knowledge graph entity alignment algorithm based on neural tensor network
Liu, Jianyi
Chai, Biao
Shang, Zhijie
BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2021, 128 : 31 - 32
[38] Investigation of using disentangled and interpretable representations for one-shot cross-lingual voice conversion
Mohammadi, Seyed Hamidreza
Kim, Taehwan
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2833 - 2837
[39] Cross-Lingual Voice Conversion using a Cyclic Variational Auto-encoder and a WaveNet Vocoder
Nakatani, Hikaru
Tobing, Patrick Lumban
Takeda, Kazuya
Toda, Tomoki
2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 520 - 526
[40] Continuous vocoder applied in deep neural network based voice conversion
Mohammed Salah Al-Radhi
Tamás Gábor Csapó
Géza Németh
Multimedia Tools and Applications, 2019, 78 : 33549 - 33572

← 1 2 3 4 5 →