Tensor-based Speaker Space Construction for Arbitrary Speaker Conversion

被引:0
|
作者
Saito, Daisuke [1 ]
Minematsu, Nobuaki [1 ]
Hirose, Keikichi [1 ]
机构
[1] Univ Tokyo, Tokyo, Japan
关键词
Voice conversion; Gaussian mixture model; tensor analysis; Tucker decomposition; VOICE CONVERSION; SPEECH RECOGNITION; ADAPTATION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of speaker space. In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice Gaussian mixture model (EV-GMM) was proposed. In the EVC, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the Gaussian component and the dimension of the mean vector, and the speaker space is derived by the tensor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
引用
收藏
页码:595 / 598
页数:4
相关论文
共 50 条
  • [1] Effects of Speaker Adaptive Training on Tensor-based Arbitrary Speaker Conversion
    Saito, Daisuke
    Minematsu, Nobuaki
    Hirose, Keikichi
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 98 - 101
  • [2] Tensor Factor Analysis for Arbitrary Speaker Conversion
    Saito, Daisuke
    Minematsu, Nobuaki
    Hirose, Keikichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (06): : 1395 - 1405
  • [3] Arbitrary speaker conversion based on speaker space bases constructed by deep neural networks
    Hashimoto, Tetsuya
    Saito, Daisuke
    Minematsu, Nobuaki
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [4] One-to-Many Voice Conversion Based on Tensor Representation of Speaker Space
    Saito, Daisuke
    Yamamoto, Keisuke
    Minematsu, Nobuaki
    Hirose, Keikichi
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 660 - 663
  • [5] Fast Speaker Idntification Based on Speaker Metric Space
    Feng Yong
    Guo Jichuan
    Cao Junhua
    Zhu Lei
    2015 IEEE ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2015, : 1167 - 1171
  • [6] Voice Conversion Across Arbitrary Speakers based on a Single Target-Speaker Utterance
    Liu, Songxiang
    Zhong, Jinghua
    Sun, Lifa
    Wu, Xixin
    Liu, Xunying
    Meng, Helen
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 496 - 500
  • [7] Voice conversion based on static speaker characteristics
    Schwardt, L.C.
    du Preez, J.A.
    Proceedings of the South African Symposium on Communications and Signal Processing, COMSIG, 1998, : 57 - 62
  • [8] Voice conversion based on static speaker characteristics
    Schwardt, LC
    du Preez, JA
    PROCEEDINGS OF THE 1998 SOUTH AFRICAN SYMPOSIUM ON COMMUNICATIONS AND SIGNAL PROCESSING: COMSIG '98, 1998, : 57 - 62
  • [9] Reducing speaker model search space in speaker identification
    De Leon, Phillip L.
    Apsingekar, Vijendra
    2007 BIOMETRICS SYMPOSIUM, 2007, : 90 - 95
  • [10] DODEC speaker construction
    Patrick, Peter
    Canadian Acoustics - Acoustique Canadienne, 2002, 30 (02): : 28 - 29