One-to-Many Voice Conversion Based on Tensor Representation of Speaker Space

被引:0
|
作者
Saito, Daisuke [1 ]
Yamamoto, Keisuke [1 ]
Minematsu, Nobuaki [1 ]
Hirose, Keikichi [1 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo 1138654, Japan
关键词
voice conversion; Gaussian mixture model; eigenvoice; tensor analysis; Tucker decomposition; SPEECH RECOGNITION; ADAPTATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of speaker space. In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice Gaussian mixture model (EV-GMM) was proposed. In the EVC, similarly to speaker recognition approaches, a speaker space is constructed based on GMM, supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the Gaussian component and the dimension of the mean vector, and the speaker space is derived by the tensor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
引用
收藏
页码:660 / 663
页数:4
相关论文
共 50 条
  • [1] One-to-many and many-to-one voice conversion based on eigenvoices
    Toda, Tomoki
    Ohtani, Yamato
    Shikano, Kiyohiro
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1249 - +
  • [2] Adaptive Voice-Quality Control Based on One-to-Many Eigenvoice Conversion
    Ohta, Kumi
    Toda, Tomoki
    Ohtani, Yamato
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2158 - +
  • [3] Speaker Adaptive Training for One-to-Many Eigenvoice Conversion Based on Gaussian Mixture Model
    Ohtani, Yamato
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2496 - 2499
  • [4] ONE-TO-MANY CONVERSION FOR PERCUSSIVE SAMPLES
    Fagerstrom, Jon
    Schlecht, Sebastian J.
    Valimaki, Vesa
    2021 24TH INTERNATIONAL CONFERENCE ON DIGITAL AUDIO EFFECTS (DAFX), 2021, : 129 - 135
  • [5] Tensor-based Speaker Space Construction for Arbitrary Speaker Conversion
    Saito, Daisuke
    Minematsu, Nobuaki
    Hirose, Keikichi
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 595 - 598
  • [6] Alaryngeal Speech Enhancement Based on One-to-Many Eigenvoice Conversion
    Doi, Hironori
    Toda, Tomoki
    Nakamura, Keigo
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (01) : 172 - 183
  • [7] Alleviating the One-to-Many Mapping Problem in Voice Conversion with Context-Dependent Modeling
    Godoy, Elizabeth
    Rosec, Olivier
    Chonavel, Thierry
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1595 - +
  • [8] Improvements of the One-to-Many Eigenvoice Conversion System
    Ohtani, Yamato
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2491 - 2499
  • [9] MANY-TO-ONE VOICE CONVERSION USING EXEMPLAR-BASED SPARSE REPRESENTATION
    Aihara, Ryo
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2015 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2015,
  • [10] An Improved One-to-Many Eigenvoice Conversion System
    Ohtani, Yamato
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1080 - 1083