Joint Dictionary Learning-Based Non-Negative Matrix Factorization for Voice Conversion to Improve Speech Intelligibility After Oral Surgery

被引:32
|
作者
Fu, Szu-Wei [1 ,2 ]
Li, Pei-Chun [3 ]
Lai, Ying-Hui [4 ]
Yang, Cheng-Chien [3 ]
Hsieh, Li-Chun [3 ]
Tsao, Yu [2 ]
机构
[1] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei, Taiwan
[2] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei 11574, Taiwan
[3] Mackay Med Coll, Dept Audiol & Speech Language Pathol, New Taipei, Taiwan
[4] Yuan Ze Univ, Dept Elect Engn, Taoyuan, Taiwan
关键词
Joint dictionary learning; non-negative matrix factorization (NMF); sparse representation; voice conversion (VC);
D O I
10.1109/TBME.2016.2644258
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient's speech may be distorted and difficult to understand. To overcome this problem, VC methods can be applied to convert the distorted speech such that it is clear and more intelligible. To design an effective VC method, two key points must be considered: 1) the amount of training data may be limited (because speaking for a long time is usually difficult for postoperative patients); 2) rapid conversion is desirable (for better communication). Methods: We propose a novel joint dictionary learning based non-negative matrix factorization (JD-NMF) algorithm. Compared to conventional VC techniques, JD-NMF can perform VC efficiently and effectively with only a small amount of training data. Results: The experimental results demonstrate that the proposed JD-NMF method not only achieves notably higher short-time objective intelligibility (STOI) scores (a standardized objective intelligibility evaluation metric) than those obtained using the original unconverted speech but is also significantly more efficient and effective than a conventional exemplar-based NMF VC method. Conclusion: The proposed JD-NMF method may outperform the state-of-the-art exemplar-based NMF VC method in terms of STOI scores under the desired scenario. Significance: We confirmed the advantages of the proposed joint training criterion for the NMF-based VC. Moreover, we verified that the proposed JD-NMF can effectively improve the speech intelligibility scores of oral surgery patients.
引用
收藏
页码:2584 / 2594
页数:11
相关论文
共 50 条
  • [1] Multimodal voice conversion based on non-negative matrix factorization
    Kenta Masaka
    Ryo Aihara
    Tetsuya Takiguchi
    Yasuo Ariki
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [2] Multimodal voice conversion based on non-negative matrix factorization
    Masaka, Kenta
    Aihara, Ryo
    Takiguchi, Tetsuya
    Ariki, Yasuo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015,
  • [3] VOICE CONVERSION BASED ON NON-NEGATIVE MATRIX FACTORIZATION USING PHONEME-CATEGORIZED DICTIONARY
    Aihara, Ryo
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [4] Voice Conversion based on Non-negative Matrix Factorization in Noisy Environments
    Fujii, Takao
    Aihara, Ryo
    Takashima, Ryoichi
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2013 IEEE/SICE INTERNATIONAL SYMPOSIUM ON SYSTEM INTEGRATION (SII), 2013, : 495 - 498
  • [5] Parallel Dictionary Learning for Voice Conversion Using Discriminative Graph-embedded Non-negative Matrix Factorization
    Aihara, Ryo
    Takiguchi, Tetsuya
    Ariki, Yasuo
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 292 - 296
  • [6] The Voice Conversion Method Based on Sparse Convolutive Non-negative Matrix Factorization
    Zhang, Qianmin
    Tao, Liang
    Zhou, Jian
    Wang, Huabin
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ELECTRICAL AND INFORMATION TECHNOLOGIES FOR RAIL TRANSPORTATION: TRANSPORTATION, 2016, 378 : 259 - 267
  • [7] Many-to-many Voice Conversion Based on Multiple Non-negative Matrix Factorization
    Aihara, Ryo
    Takiguchi, Testuya
    Ariki, Yasuo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2749 - 2753
  • [8] Exemplar-based Emotional Voice Conversion Using Non-negative Matrix Factorization
    Aihara, Ryo
    Ueda, Reina
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [9] Toward semantic attributes in dictionary learning and non-negative matrix factorization
    Babaee, Mohammadreza
    Wolf, Thomas
    Rigoll, Gerhard
    PATTERN RECOGNITION LETTERS, 2016, 80 : 172 - 178
  • [10] Supervised Dictionary Learning via Non-Negative Matrix Factorization for Classification
    Li, Yifeng
    Ngom, Alioune
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 439 - 443