UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS USING TWO-PASS DECISION TREE CONSTRUCTION

被引：9

作者：

Gibson, Matthew ^{[1
]}

Hirsimaki, Teemu ^{[2
]}

Karhila, Reima ^{[2
]}

Kurimo, Mikko ^{[2
]}

Byrne, William ^{[1
]}

机构：

[1] Univ Cambridge, Dept Engn, Trumpington St, Cambridge CB2 1PZ, England

[2] Aalto Univ, FIN-5400 Helsinki, Finland

来源：

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年

关键词：

HMM-based speech synthesis; unsupervised speaker adaptation; cross-lingual;

D O I：

10.1109/ICASSP.2010.5495196

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper demonstrates how unsupervised cross-lingual adaptation of HMM-based speech synthesis models may be performed without explicit knowledge of the adaptation data language. A two-pass decision tree construction technique is deployed for this purpose. Using parallel translated datasets, cross-lingual and intralingual adaptation are compared in a controlled manner. Listener evaluations reveal that the proposed method delivers performance approaching that of unsupervised intralingual adaptation.

引用

页码：4642 / 4645

页数：4

共 50 条

[41] Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis
Hongwu Yang
Keiichiro Oura
Haiyan Wang
Zhenye Gan
Keiichi Tokuda
Multimedia Tools and Applications, 2015, 74 : 9927 - 9942
[42] Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model
Hiroya, Sadao
Honda, Masaaki
IEICE Transactions on Information and Systems, 2004, E87-D (05) : 1071 - 1078
[43] Cross-lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space
Xin, Detai
Saito, Yuki
Takamichi, Shinnosuke
Koriyama, Tomoki
Saruwatari, Hiroshi
INTERSPEECH 2020, 2020, : 2947 - 2951
[44] Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model
Hiroya, S
Honda, M
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1071 - 1078
[45] HMM-BASED SPEECH SYNTHESIS ADAPTATION USING NOISY DATA: ANALYSIS AND EVALUATION METHODS
Karhila, Reima
Remes, Ulpu
Kurimo, Mikko
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6930 - 6934
[46] Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis
Maeno, Yu
Nose, Takashi
Kobayashi, Takao
Koriyama, Tomoki
Ijima, Yusuke
Nakajima, Hideharu
Mizuno, Hideyuki
Yoshioka, Osamu
SPEECH COMMUNICATION, 2014, 57 : 144 - 154
[47] Two-pass search strategy using accumulated band energy histogram for HMM-based identification of perceptually identical music
Myung, Jinbok
Kim, Kwang-Ho
Park, Jeong-sik
Koo, Myoung-Wan
Kim, Ji-Hwan
INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2013, 23 (02) : 127 - 132
[48] A decision tree-based clustering approach to state definition in an excitation modeling framework for HMM-based speech synthesis
Maia, Ranniery
Toda, Tomoki
Tokuda, Keiichi
Sakai, Shinsuke
Nakamura, Satoshi
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1743 - 1746
[49] Unsupervised Speaker Adaptation for DNN-based Speech Synthesis using Input Codes
Takaki, Shinji
Nishimura, Yoshikazu
Yamagishi, Junichi
2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 649 - 658
[50] Polyglot Speech Synthesis Based on Cross-Lingual Frame Selection Using Auditory and Articulatory Features
Chen, Chia-Ping
Huang, Yi-Chin
Wu, Chung-Hsien
Lee, Kuan-De
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1558 - 1570

← 1 2 3 4 5 →