One-to-Many Voice Conversion Based on Tensor Representation of Speaker Space

被引:0
|
作者
Saito, Daisuke [1 ]
Yamamoto, Keisuke [1 ]
Minematsu, Nobuaki [1 ]
Hirose, Keikichi [1 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo 1138654, Japan
关键词
voice conversion; Gaussian mixture model; eigenvoice; tensor analysis; Tucker decomposition; SPEECH RECOGNITION; ADAPTATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of speaker space. In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice Gaussian mixture model (EV-GMM) was proposed. In the EVC, similarly to speaker recognition approaches, a speaker space is constructed based on GMM, supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the Gaussian component and the dimension of the mean vector, and the speaker space is derived by the tensor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
引用
收藏
页码:660 / 663
页数:4
相关论文
共 50 条
  • [41] One-to-many matching and section-based formulation of autonomous ridesharing equilibrium
    Noruzoliaee, Mohamadhossein
    Zou, Bo
    TRANSPORTATION RESEARCH PART B-METHODOLOGICAL, 2022, 155 : 72 - 100
  • [42] SPARSE REPRESENTATION FOR FREQUENCY WARPING BASED VOICE CONVERSION
    Tian, Xiaohai
    Wu, Zhizheng
    Lee, Siu Wa
    Nguyen Quy Hy
    Chng, Eng Siong
    Dong, Minghui
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4235 - 4239
  • [43] Research on one-to-many supply chain with inventory inaccuracy based on RFID technology
    Lei, Quansheng
    Jiang, Xinyi
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INFORMATION ENGINEERING FOR MECHANICS AND MATERIALS, 2015, 21 : 1013 - 1018
  • [44] Learning One-to-Many Mapping With Locally Linear Maps Based on Manifold Structure
    Oh, Do-Kwan
    Oh, Sang-Hoon
    Lee, Soo-Young
    IEEE SIGNAL PROCESSING LETTERS, 2011, 18 (09) : 521 - 524
  • [45] A Study on the One-To-Many Authentication Scheme for Cryptosystem Based on Quantum Key Distribution
    Bae, Minyoung
    Kang, Ju-Sung
    Yeom, Yongjin
    2017 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON), 2017, : 151 - 154
  • [46] A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder
    Sisman, Berrak
    Zhang, Mingyang
    Li, Haizhou
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1978 - 1982
  • [47] Voice Conversion with a Strategy for Separating Speaker Individuality Using State-Space Model
    Xu, Ning
    Yang, Zhen
    Guo, Haiyan
    2010 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND INFORMATION SECURITY (WCNIS), VOL 1, 2010, : 298 - 301
  • [48] Voice Conversion for TTS Systems with Tuning on the Target Speaker Based on GMM
    Zahariev, Vadim
    Azarov, Elias
    Petrovsky, Alexander
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 788 - 798
  • [49] Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06): : 1403 - 1410
  • [50] Improving robustness of one-shot voice conversion with deep discriminative speaker encoder
    Du, Hongqiang
    Xie, Lei
    INTERSPEECH 2021, 2021, : 1379 - 1383