Multimodal voice conversion based on non-negative matrix factorization

被引:0
|
作者
Kenta Masaka
Ryo Aihara
Tetsuya Takiguchi
Yasuo Ariki
机构
[1] Kobe University,Graduate School of System Informatics
[2] Kobe University,Organization of Advanced Science and Technology
关键词
Voice conversion; Multimodal; Image features; Non-negative matrix factorization; Noise robustness;
D O I
暂无
中图分类号
学科分类号
摘要
A multimodal voice conversion (VC) method for noisy environments is proposed. In our previous non-negative matrix factorization (NMF)-based VC method, source and target exemplars are extracted from parallel training data, in which the same texts are uttered by the source and target speakers. The input source signal is then decomposed into source exemplars, noise exemplars, and their weights. Then, the converted speech is constructed from the target exemplars and the weights related to the source exemplars. In this study, we propose multimodal VC that improves the noise robustness of our NMF-based VC method. Furthermore, we introduce the combination weight between audio and visual features and formulate a new cost function to estimate audio-visual exemplars. Using the joint audio-visual features as source features, VC performance is improved compared with that of a previous audio-input exemplar-based VC method. The effectiveness of the proposed method is confirmed by comparing its effectiveness with that of a conventional audio-input NMF-based method and a Gaussian mixture model-based method.
引用
收藏
相关论文
共 50 条
  • [1] Multimodal voice conversion based on non-negative matrix factorization
    Masaka, Kenta
    Aihara, Ryo
    Takiguchi, Tetsuya
    Ariki, Yasuo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015,
  • [2] MULTIMODAL VOICE CONVERSION USING NON-NEGATIVE MATRIX FACTORIZATION IN NOISY ENVIRONMENTS
    Masaka, Kenta
    Aihara, Ryo
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] Voice Conversion based on Non-negative Matrix Factorization in Noisy Environments
    Fujii, Takao
    Aihara, Ryo
    Takashima, Ryoichi
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2013 IEEE/SICE INTERNATIONAL SYMPOSIUM ON SYSTEM INTEGRATION (SII), 2013, : 495 - 498
  • [4] The Voice Conversion Method Based on Sparse Convolutive Non-negative Matrix Factorization
    Zhang, Qianmin
    Tao, Liang
    Zhou, Jian
    Wang, Huabin
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ELECTRICAL AND INFORMATION TECHNOLOGIES FOR RAIL TRANSPORTATION: TRANSPORTATION, 2016, 378 : 259 - 267
  • [5] Many-to-many Voice Conversion Based on Multiple Non-negative Matrix Factorization
    Aihara, Ryo
    Takiguchi, Testuya
    Ariki, Yasuo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2749 - 2753
  • [6] Exemplar-based Emotional Voice Conversion Using Non-negative Matrix Factorization
    Aihara, Ryo
    Ueda, Reina
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [7] Multiple Non-Negative Matrix Factorization for Many-to-Many Voice Conversion
    Aihara, Ryo
    Takiguchi, Tetsuya
    Ariki, Yasuo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (07) : 1175 - 1184
  • [8] INmfCA Algorithm for Training of Nonparallel Voice Conversion Systems Based on Non-Negative Matrix Factorization
    Suda, Hitoshi
    Kotani, Gaku
    Saito, Daisuke
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (06) : 1196 - 1210
  • [9] ACTIVITY-MAPPING NON-NEGATIVE MATRIX FACTORIZATION FOR EXEMPLAR-BASED VOICE CONVERSION
    Aihara, Ryo
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4899 - 4903
  • [10] INDIVIDUALITY-PRESERVING VOICE CONVERSION FOR ARTICULATION DISORDERS BASED ON NON-NEGATIVE MATRIX FACTORIZATION
    Aihara, Ryo
    Takashima, Ryoichi
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8037 - 8040