One-to-Many Voice Conversion Based on Tensor Representation of Speaker Space

被引：0

作者：

Saito, Daisuke ^{[1
]}

Yamamoto, Keisuke ^{[1
]}

Minematsu, Nobuaki ^{[1
]}

Hirose, Keikichi ^{[1
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo 1138654, Japan

来源：

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年

关键词：

voice conversion; Gaussian mixture model; eigenvoice; tensor analysis; Tucker decomposition; SPEECH RECOGNITION; ADAPTATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of speaker space. In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice Gaussian mixture model (EV-GMM) was proposed. In the EVC, similarly to speaker recognition approaches, a speaker space is constructed based on GMM, supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the Gaussian component and the dimension of the mean vector, and the speaker space is derived by the tensor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.

引用

页码：660 / 663

页数：4

共 50 条

[1] One-to-many and many-to-one voice conversion based on eigenvoices
Toda, Tomoki
Ohtani, Yamato
Shikano, Kiyohiro
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1249 - +
[2] Adaptive Voice-Quality Control Based on One-to-Many Eigenvoice Conversion
Ohta, Kumi
Toda, Tomoki
Ohtani, Yamato
Saruwatari, Hiroshi
Shikano, Kiyohiro
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2158 - +
[3] Speaker Adaptive Training for One-to-Many Eigenvoice Conversion Based on Gaussian Mixture Model
Ohtani, Yamato
Toda, Tomoki
Saruwatari, Hiroshi
Shikano, Kiyohiro
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2496 - 2499
[4] ONE-TO-MANY CONVERSION FOR PERCUSSIVE SAMPLES
Fagerstrom, Jon
Schlecht, Sebastian J.
Valimaki, Vesa
2021 24TH INTERNATIONAL CONFERENCE ON DIGITAL AUDIO EFFECTS (DAFX), 2021, : 129 - 135
[5] Tensor-based Speaker Space Construction for Arbitrary Speaker Conversion
Saito, Daisuke
Minematsu, Nobuaki
Hirose, Keikichi
PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 595 - 598
[6] Alaryngeal Speech Enhancement Based on One-to-Many Eigenvoice Conversion
Doi, Hironori
Toda, Tomoki
Nakamura, Keigo
Saruwatari, Hiroshi
Shikano, Kiyohiro
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (01) : 172 - 183
[7] Alleviating the One-to-Many Mapping Problem in Voice Conversion with Context-Dependent Modeling
Godoy, Elizabeth
Rosec, Olivier
Chonavel, Thierry
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1595 - +
[8] Improvements of the One-to-Many Eigenvoice Conversion System
Ohtani, Yamato
Toda, Tomoki
Saruwatari, Hiroshi
Shikano, Kiyohiro
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2491 - 2499
[9] MANY-TO-ONE VOICE CONVERSION USING EXEMPLAR-BASED SPARSE REPRESENTATION
Aihara, Ryo
Takiguchi, Tetsuya
Ariki, Yasuo
2015 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2015,
[10] An Improved One-to-Many Eigenvoice Conversion System
Ohtani, Yamato
Toda, Tomoki
Saruwatari, Hiroshi
Shikano, Kiyohiro
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1080 - 1083

← 1 2 3 4 5 →