UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS USING TWO-PASS DECISION TREE CONSTRUCTION

被引:9
|
作者
Gibson, Matthew [1 ]
Hirsimaki, Teemu [2 ]
Karhila, Reima [2 ]
Kurimo, Mikko [2 ]
Byrne, William [1 ]
机构
[1] Univ Cambridge, Dept Engn, Trumpington St, Cambridge CB2 1PZ, England
[2] Aalto Univ, FIN-5400 Helsinki, Finland
关键词
HMM-based speech synthesis; unsupervised speaker adaptation; cross-lingual;
D O I
10.1109/ICASSP.2010.5495196
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper demonstrates how unsupervised cross-lingual adaptation of HMM-based speech synthesis models may be performed without explicit knowledge of the adaptation data language. A two-pass decision tree construction technique is deployed for this purpose. Using parallel translated datasets, cross-lingual and intralingual adaptation are compared in a controlled manner. Listener evaluations reveal that the proposed method delivers performance approaching that of unsupervised intralingual adaptation.
引用
收藏
页码:4642 / 4645
页数:4
相关论文
共 50 条
  • [41] Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis
    Hongwu Yang
    Keiichiro Oura
    Haiyan Wang
    Zhenye Gan
    Keiichi Tokuda
    Multimedia Tools and Applications, 2015, 74 : 9927 - 9942
  • [42] Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model
    Hiroya, Sadao
    Honda, Masaaki
    IEICE Transactions on Information and Systems, 2004, E87-D (05) : 1071 - 1078
  • [43] Cross-lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space
    Xin, Detai
    Saito, Yuki
    Takamichi, Shinnosuke
    Koriyama, Tomoki
    Saruwatari, Hiroshi
    INTERSPEECH 2020, 2020, : 2947 - 2951
  • [44] Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model
    Hiroya, S
    Honda, M
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1071 - 1078
  • [45] HMM-BASED SPEECH SYNTHESIS ADAPTATION USING NOISY DATA: ANALYSIS AND EVALUATION METHODS
    Karhila, Reima
    Remes, Ulpu
    Kurimo, Mikko
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6930 - 6934
  • [46] Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis
    Maeno, Yu
    Nose, Takashi
    Kobayashi, Takao
    Koriyama, Tomoki
    Ijima, Yusuke
    Nakajima, Hideharu
    Mizuno, Hideyuki
    Yoshioka, Osamu
    SPEECH COMMUNICATION, 2014, 57 : 144 - 154
  • [47] Two-pass search strategy using accumulated band energy histogram for HMM-based identification of perceptually identical music
    Myung, Jinbok
    Kim, Kwang-Ho
    Park, Jeong-sik
    Koo, Myoung-Wan
    Kim, Ji-Hwan
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2013, 23 (02) : 127 - 132
  • [48] A decision tree-based clustering approach to state definition in an excitation modeling framework for HMM-based speech synthesis
    Maia, Ranniery
    Toda, Tomoki
    Tokuda, Keiichi
    Sakai, Shinsuke
    Nakamura, Satoshi
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1743 - 1746
  • [49] Unsupervised Speaker Adaptation for DNN-based Speech Synthesis using Input Codes
    Takaki, Shinji
    Nishimura, Yoshikazu
    Yamagishi, Junichi
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 649 - 658
  • [50] Polyglot Speech Synthesis Based on Cross-Lingual Frame Selection Using Auditory and Articulatory Features
    Chen, Chia-Ping
    Huang, Yi-Chin
    Wu, Chung-Hsien
    Lee, Kuan-De
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1558 - 1570