Multimodal voice conversion based on non-negative matrix factorization

被引:0
|
作者
Kenta Masaka
Ryo Aihara
Tetsuya Takiguchi
Yasuo Ariki
机构
[1] Kobe University,Graduate School of System Informatics
[2] Kobe University,Organization of Advanced Science and Technology
关键词
Voice conversion; Multimodal; Image features; Non-negative matrix factorization; Noise robustness;
D O I
暂无
中图分类号
学科分类号
摘要
A multimodal voice conversion (VC) method for noisy environments is proposed. In our previous non-negative matrix factorization (NMF)-based VC method, source and target exemplars are extracted from parallel training data, in which the same texts are uttered by the source and target speakers. The input source signal is then decomposed into source exemplars, noise exemplars, and their weights. Then, the converted speech is constructed from the target exemplars and the weights related to the source exemplars. In this study, we propose multimodal VC that improves the noise robustness of our NMF-based VC method. Furthermore, we introduce the combination weight between audio and visual features and formulate a new cost function to estimate audio-visual exemplars. Using the joint audio-visual features as source features, VC performance is improved compared with that of a previous audio-input exemplar-based VC method. The effectiveness of the proposed method is confirmed by comparing its effectiveness with that of a conventional audio-input NMF-based method and a Gaussian mixture model-based method.
引用
收藏
相关论文
共 50 条
  • [41] On affine non-negative matrix factorization
    Laurberg, Hans
    Hansen, Lars Kai
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PTS 1-3, 2007, : 653 - +
  • [42] Small-parallel exemplar-based voice conversion in noisy environments using affine non-negative matrix factorization
    Ryo Aihara
    Takao Fujii
    Toru Nakashika
    Tetsuya Takiguchi
    Yasuo Ariki
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [43] Small-parallel exemplar-based voice conversion in noisy environments using affine non-negative matrix factorization
    Aihara, Ryo
    Fujii, Takao
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 9
  • [44] Parallel Dictionary Learning for Voice Conversion Using Discriminative Graph-embedded Non-negative Matrix Factorization
    Aihara, Ryo
    Takiguchi, Tetsuya
    Ariki, Yasuo
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 292 - 296
  • [45] MCA-NMF: Multimodal Concept Acquisition with Non-Negative Matrix Factorization
    Mangin, Olivier
    Filliat, David
    ten Bosch, Louis
    Oudeyer, Pierre-Yves
    PLOS ONE, 2015, 10 (10):
  • [46] Human Action Recognition Based on Non-negative Matrix Factorization
    Lin, Chih-Yang
    Chen, Bo-You
    Wu, Wen-Chuan
    Lin, Wei-Yang
    Tsai, Chia-Ling
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 1091 - 1093
  • [47] Obtaining Profiles Based on Localized Non-negative Matrix Factorization
    JIANG Ji-xiang 1
    WuhanUniversityJournalofNaturalSciences, 2004, (05) : 580 - 584
  • [48] AN APPROACH TO DOUBLETALK DETECTION BASED ON NON-NEGATIVE MATRIX FACTORIZATION
    Cahill, Niall
    Lawlor, Robert
    ISSPIT: 8TH IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, 2008, : 497 - 501
  • [49] Age Estimation Based on Extended Non-negative Matrix Factorization
    Zhan, Ce
    Li, Wanqing
    Ogunbona, Philip
    2011 IEEE 13TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2011,
  • [50] Face image analysis based on non-negative matrix factorization
    Liu Cuixiang
    Zhang Yan
    Yu Ming
    Proceedings of the First International Symposium on Test Automation & Instrumentation, Vols 1 - 3, 2006, : 388 - 391