Voice Conversion from Tibetan Amdo Dialect to Tibetan U-tsang Dialect Based on Generative Adversarial Networks

被引:0
|
作者
Gan Zhenye [1 ,2 ]
Zhao Guangying [1 ]
Yang Hongwu [1 ,2 ]
Xing Xiaotian [1 ]
Jiao Yi [1 ]
机构
[1] Northwest Normal Univ, Coll Phys & Elect Engn, Lanzhou, Gansu, Peoples R China
[2] Intelligent Informat Proc Centert Gansu Prov, Lanzhou, Gansu, Peoples R China
基金
中国国家自然科学基金;
关键词
Generative Adversarial Networks; Voice Conversion; over-smoothing; Deep Neural Networks;
D O I
10.1109/itaic.2019.8785447
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a Voice Conversion (VC) method from Tibetan Amdo dialect to Tibetan U-tsang dialect based on Generative Adversarial Networks (GANs). An inevitable problem with the traditional VC framework is that the acoustic feature vector output from the conversion model is over-smoothing, which leads to a drop in the quality of the converted speech. This is because in the training phase of acoustic model, a specific probability model is used to model the distribution of data, so that the output of a relatively average parameter of the model is considered to be optimal. Acoustic parameter over-smoothing occurs as long as the analytical form of the model distribution is artificially designed. In order to overcome this problem, the VC framework proposed in this paper uses GANs as the modeling network of the acoustic model, directly uses a generator model to learn the distribution of data, and guides the generator through a discriminator model. The training of the model makes the sample distribution of the model close to the distribution of the target speaker data samples, thus alleviating the problem of over-smoothing of the converted speech spectrum. The experimental results show that the proposed method is superior to VC based on Deep Neural Networks (DNNs) in the sound quality and similarity of the converted speech.
引用
收藏
页码:325 / 329
页数:5
相关论文
共 7 条
  • [1] A Streaming End-to-End Speech Recognition Approach Based on WeNet for Tibetan Amdo Dialect
    Wang, Chao
    Wen, Yao
    Lhamo, Phurba
    Tashi, Nyima
    2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 317 - 322
  • [2] Genetic polymorphisms and phylogenetic analyses of the u-Tsang Tibetan from Lhasa based on 30 slowly and moderately mutated Y-STR loci
    Ding, Jiuyang
    Fan, Haoliang
    Zhou, Yongsong
    Wang, Zhuo
    Wang, Xiao
    Song, Xuheng
    Zhu, Bofeng
    Qiu, Pingming
    FORENSIC SCIENCES RESEARCH, 2022, 7 (02) : 181 - 188
  • [3] Construction of surface air temperature over the Tibetan Plateau based on generative adversarial networks
    Yang, Ye
    You, Qinglong
    Jin, Zheng
    Zuo, Zhiyan
    Zhang, Yuqing
    INTERNATIONAL JOURNAL OF CLIMATOLOGY, 2022, 42 (16) : 10107 - 10125
  • [4] One-Shot Voice Conversion Based on Style Generative Adversarial Networks with ESR and DSNet
    Li, Yanping
    Pan, Lei
    Qiu, Xiangtian
    Yang, Zeyu
    Tan, Zhicheng
    Qian, Bo
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (7) : 4565 - 4587
  • [5] Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks
    Hsu, Chin-Cheng
    Hwang, Hsin-Te
    Wu, Yi-Chiao
    Tsao, Yu
    Wang, Hsin-Min
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3364 - 3368
  • [6] A Survey on Generative Adversarial Networks based Models for Many-to-many Non-parallel Voice Conversion
    Alaa, Yasmin
    Alfonse, Marco
    Aref, Mostafa M.
    5TH INTERNATIONAL CONFERENCE ON COMPUTING AND INFORMATICS (ICCI 2022), 2022, : 221 - 226
  • [7] Voice Conversion from Arbitrary Speakers Based on Deep Neural Networks with Adversarial Learning
    Miyamoto, Sou
    Nose, Takashi
    Ito, Suzunosuke
    Koike, Harunori
    Chiba, Yuya
    Ito, Akinori
    Shinozaki, Takahiro
    ADVANCES IN INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, PT II, 2018, 82 : 97 - 103