UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS USING TWO-PASS DECISION TREE CONSTRUCTION

被引:9
|
作者
Gibson, Matthew [1 ]
Hirsimaki, Teemu [2 ]
Karhila, Reima [2 ]
Kurimo, Mikko [2 ]
Byrne, William [1 ]
机构
[1] Univ Cambridge, Dept Engn, Trumpington St, Cambridge CB2 1PZ, England
[2] Aalto Univ, FIN-5400 Helsinki, Finland
关键词
HMM-based speech synthesis; unsupervised speaker adaptation; cross-lingual;
D O I
10.1109/ICASSP.2010.5495196
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper demonstrates how unsupervised cross-lingual adaptation of HMM-based speech synthesis models may be performed without explicit knowledge of the adaptation data language. A two-pass decision tree construction technique is deployed for this purpose. Using parallel translated datasets, cross-lingual and intralingual adaptation are compared in a controlled manner. Listener evaluations reveal that the proposed method delivers performance approaching that of unsupervised intralingual adaptation.
引用
收藏
页码:4642 / 4645
页数:4
相关论文
共 50 条
  • [31] Decision Tree-based Clustering with Outlier Detection for HMM-based Speech Synthesis
    Oh, Kyung Hwan
    Sung, June Sig
    Hong, Doo Hwa
    Kim, Nam Soo
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 108 - +
  • [32] Some Aspects of ASR Transcription Based Unsupervised Speaker Adaptation for HMM Speech Synthesis
    Toth, Balint
    Fegyo, Tibor
    Nemeth, Geza
    TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 408 - 415
  • [33] Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR
    Tamura, M
    Masuko, T
    Tokuda, K
    Kobayashi, T
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 805 - 808
  • [34] HMM-Based Thai Speech Synthesis Using Unsupervised Stress Context Labeling
    Moungsri, Decha
    Koriyama, Tomoki
    Kobayashi, Takao
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [35] HMM-Based Persian Speech Synthesis Using Limited Adaptation Data
    Bahmaninezhad, Fahimeh
    Sameti, Hossein
    Khorram, Soheil
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 585 - 589
  • [36] HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation
    Nose, Takashi
    Tachibana, Makoto
    Kobayashi, Takao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (03): : 489 - 497
  • [37] Cross-lingual, Multi-speaker Text-To-Speech Synthesis Using Neural Speaker Embedding
    Chen, Mengnan
    Chen, Minchuan
    Liang, Shuang
    Ma, Jun
    Chen, Lei
    Wang, Shaojun
    Xiao, Jing
    INTERSPEECH 2019, 2019, : 2105 - 2109
  • [38] CROSS-LINGUAL TEXT-INDEPENDENT SPEAKER VERIFICATION USING UNSUPERVISED ADVERSARIAL DISCRIMINATIVE DOMAIN ADAPTATION
    Xia, Wei
    Huang, Jing
    Hansen, John H. L.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5816 - 5820
  • [39] EFFECTIVE SENTENCE SELECTION BASED ON PHONE/MODEL COVERAGE MAXIMIZATION FOR SPEAKER ADAPTATION IN HMM-BASED SPEECH SYNTHESIS
    Lin, Cheng Hsien
    Huang, Po Kai
    Lin, Cheng Yuan
    Kuo, Chih Chung
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 74 - 78
  • [40] Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis
    Yang, Hongwu
    Oura, Keiichiro
    Wang, Haiyan
    Gan, Zhenye
    Tokuda, Keiichi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (22) : 9927 - 9942