One-to-Many Voice Conversion Based on Tensor Representation of Speaker Space

被引:0
|
作者
Saito, Daisuke [1 ]
Yamamoto, Keisuke [1 ]
Minematsu, Nobuaki [1 ]
Hirose, Keikichi [1 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo 1138654, Japan
关键词
voice conversion; Gaussian mixture model; eigenvoice; tensor analysis; Tucker decomposition; SPEECH RECOGNITION; ADAPTATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of speaker space. In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice Gaussian mixture model (EV-GMM) was proposed. In the EVC, similarly to speaker recognition approaches, a speaker space is constructed based on GMM, supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the Gaussian component and the dimension of the mean vector, and the speaker space is derived by the tensor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
引用
收藏
页码:660 / 663
页数:4
相关论文
共 50 条
  • [31] Evaluation of a Singing Voice Conversion Method Based on Many-to-Many Eigenvoice Conversion
    Doi, Hironori
    Toda, Tomoki
    Nakano, Tomoyasu
    Goto, Masataka
    Nakamura, Satoshi
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1066 - 1070
  • [32] Systematic development strategy for structure based one-to-many disassembly concepts
    Willems, Barbara
    Duflou, Joost R.
    PROCEEDINGS OF THE 2006 IEEE INTERNATIONAL SYMPOSIUM ON ELECTRONICS & THE ENVIRONMENT, CONFERENCE RECORD, 2006, : 239 - +
  • [33] One-to-many Negotiation Convening Model Based-on Similar Degree
    Dong Ting-ting
    Feng Yu-qiang
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 539 - 543
  • [34] Multilingual Machine Translation Systems at WAT 2021: One-to-Many and Many-to-One Transformer based NMT
    Mhaskar, Shivam
    Jain, Aditya
    Banerjee, Aakash
    Bhattacharyya, Pushpak
    WAT 2021: THE 8TH WORKSHOP ON ASIAN TRANSLATION, 2021, : 233 - 237
  • [35] Many-to-many voice conversion with sentence embedding based on VAACGAN
    Li, Yanping
    Cao, Pan
    Shi, Yang
    Zhang, Yan
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2021, 47 (03): : 500 - 508
  • [36] OTM-HC: Enhanced Skeleton-Based Action Representation via One-to-Many Hierarchical Contrastive Learning
    Usman, Muhammad
    Cao, Wenming
    Huang, Zhao
    Zhong, Jianqi
    Ji, Ruiya
    AI, 2024, 5 (04) : 2170 - 2186
  • [37] Auditory Sparse Representation for Robust Speaker Recognition Based on Tensor Structure
    Qiang Wu
    Liqing Zhang
    EURASIP Journal on Audio, Speech, and Music Processing, 2008
  • [38] Auditory Sparse Representation for Robust Speaker Recognition Based on Tensor Structure
    Wu, Qiang
    Zhang, Liqing
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2008, 2008 (1)
  • [39] Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation-based Voice Conversion
    Zhao, Xintao
    Wang, Shuai
    Chao, Yang
    Wu, Zhiyong
    Meng, Helen
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1691 - 1696
  • [40] Over 3.8 W, 3.4 μm picosecond mid-infrared parametric conversion based on a simplified one-to-many scheme
    Zhao, Junqing
    Chen, Yewang
    Ouyang, Deqin
    Liu, Minqiu
    Li, Chunbo
    Wu, Xu
    Xiong, Xianwei
    Mo, Liqiang
    Wang, Meng
    Liu, Xing
    Lv, Qitao
    Ruan, Shuangchen
    OPTICS EXPRESS, 2024, 32 (05) : 8364 - 8378