Phonetic alignment:: speech synthesis-based vs. Viterbi-based

被引：34

作者：

Malfrère, F

Deroo, O

Dutoit, T

Ris, C

机构：

[1] Fac Polytech Mons, TCTS, B-7000 Mons, Belgium

[2] Babel Technol SA, B-7000 Mons, Belgium

来源：

SPEECH COMMUNICATION | 2003年 / 40卷 / 04期

关键词：

speech segmentation; hidden Markov models; hybrid HMM/ANN systems; speech synthesis; large speech corpora;

D O I：

10.1016/S0167-6393(02)00131-0

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper we compare two different methods for automatically phonetically labeling a continuous speech database, as usually required for designing a speech recognition or speech synthesis system. The first method is based on temporal alignment of speech on a synthetic speech pattern; the second method uses either a continuous density hidden Markov models (HMM) or a hybrid HMM/ANN (artificial neural network) system in forced alignment mode. Both systems have been evaluated on read utterances not part of the training set of the HMM systems, and compared to manual segmentation. This study outlines the advantages and drawbacks of both methods. The speech synthetic system has the great advantage that no training stage (hence no large labeled database) is needed, while HMM systems easily handle multiple phonetic transcriptions (phonetic lattice). We deduce a method for the automatic creation of large phonetically labeled speech databases, based on using the synthetic speech segmentation tool to bootstrap the training process of either a HMM or a hybrid HMM/ANN system. The importance of such segmentation tools is a key point for the development of improved multilingual speech synthesis and recognition systems. (C) 2002 Elsevier Science B.V. All rights reserved.

引用

页码：503 / 515

页数：13

共 50 条

[1] Viterbi-Based Efficient Test Data Compression
Lee, Dongsoo
Roy, Kaushik
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2012, 31 (04) : 610 - 619
[2] SYNTHESIS-BASED RECOGNITION OF CONTINUOUS SPEECH
PALIWAL, KK
RAO, PVS
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1982, 71 (04): : 1016 - 1024
[3] Synthesis-based recognition of continuous speech
Paliwal, K.K.
Rao, P.V.S.
Journal of the Acoustical Society of America, 1982, 71 (04): : 1016 - 1024
[4] Viterbi-Based Efficient Test Data Compression
Lee, Dongsoo
Roy, Kaushik
2011 16TH IEEE EUROPEAN TEST SYMPOSIUM (ETS), 2011, : 204 - 204
[5] APPROACH TOWARDS A SYNTHESIS-BASED SPEECH RECOGNITION SYSTEM
THOSAR, RB
RAO, PVS
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (02): : 194 - 196
[6] Viterbi-based data association techniques for target tracking
Gad, A
Farooq, M
SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XII, 2003, 5096 : 37 - 46
[7] Improving the accuracy of the speech synthesis based phonetic alignment using multiple acoustic features
Paulo, S
Oliveira, LC
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANAGUAGE, PROCEEDINGS, 2003, 2721 : 31 - 39
[8] Fast implementation methods for Viterbi-based word-spotting
Knill, KM
Young, SJ
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 522 - 525
[9] Reliability Aware Service Placement Using a Viterbi-Based Algorithm
Karimzadeh-Farshbafan, Mohammad
Shah-Mansouri, Vahid
Niyato, Dusit
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2020, 17 (01): : 622 - 636
[10] Security Service Function Chain Deployment Using a Viterbi-Based Algorithm
Zhai, Dong
Meng, Xiangru
Kang, Qiaoyan
Hu, Hang
Meng, Qingwei
Liang, Yuan
2021 13TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN 2021), 2021, : 55 - 61

← 1 2 3 4 5 →