One-to-Many Voice Conversion Based on Tensor Representation of Speaker Space

被引:0
|
作者
Saito, Daisuke [1 ]
Yamamoto, Keisuke [1 ]
Minematsu, Nobuaki [1 ]
Hirose, Keikichi [1 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo 1138654, Japan
关键词
voice conversion; Gaussian mixture model; eigenvoice; tensor analysis; Tucker decomposition; SPEECH RECOGNITION; ADAPTATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of speaker space. In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice Gaussian mixture model (EV-GMM) was proposed. In the EVC, similarly to speaker recognition approaches, a speaker space is constructed based on GMM, supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the Gaussian component and the dimension of the mean vector, and the speaker space is derived by the tensor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
引用
收藏
页码:660 / 663
页数:4
相关论文
共 50 条
  • [21] A Lookup Tree Based Security for One-To-Many Communication
    Abuelyaman, Eltayeb
    2014 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE, ELECTRONICS AND ELECTRICAL ENGINEERING (ISEEE), VOLS 1-3, 2014, : 1292 - 1296
  • [22] One-shot Voice Conversion with Global Speaker Embeddings
    Lu, Hui
    Wu, Zhiyong
    Dai, Dongyang
    Li, Runnan
    Kang, Shiyin
    Jia, Jia
    Meng, Helen
    INTERSPEECH 2019, 2019, : 669 - 673
  • [23] Speaker and Digit Representation based Voice OTP System
    Nandyala, Sahaja
    Manche, Pavanitha
    Mishra, Jagabandhu
    Prasanna, S. R. Mahadeva
    2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024, 2024,
  • [24] Duplication Based One-to-many Coding for Trojan HW Detection
    Keren, Osnat
    Levin, Ilya
    Karpovsky, Mark
    2010 IEEE 25TH INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS (DFT 2010), 2010, : 160 - 166
  • [25] Many-to-many Cross-lingual Voice Conversion with a Jointly Trained Speaker Embedding Network
    Zhou, Yi
    Tian, Xiaohai
    Das, Rohan Kumar
    Li, Haizhou
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1282 - 1287
  • [26] VOICE CONVERSION IN TIME-INVARIANT SPEAKER-INDEPENDENT SPACE
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [27] Effects of Speaker Adaptive Training on Tensor-based Arbitrary Speaker Conversion
    Saito, Daisuke
    Minematsu, Nobuaki
    Hirose, Keikichi
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 98 - 101
  • [28] One-shot Voice Conversion with Speaker-agnostic StarGAN
    Eskimez, Sefik Emre
    Dimitriadis, Dimitrios
    Kumatani, Kenichi
    Gmyr, Robert
    INTERSPEECH 2021, 2021, : 1334 - 1338
  • [29] Fast Many-to-One Voice Conversion using Autoencoders
    Sekii, Yusuke
    Orihara, Ryohei
    Kojima, Keisuke
    Sei, Yuichi
    Tahara, Yasuyuki
    Ohsuga, Akihiko
    ICAART: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2017, : 164 - 174
  • [30] Non-Parallel Any-to-Many Voice Conversion by Replacing Speaker Statistics
    Liu, Yufei
    Yu, Chengzhu
    Shuai, Wang
    Yang, Zhenchuan
    Chao, Yang
    Zhang, Weibin
    INTERSPEECH 2021, 2021, : 1369 - 1373