Phonetic alignment:: speech synthesis-based vs. Viterbi-based

被引:34
|
作者
Malfrère, F
Deroo, O
Dutoit, T
Ris, C
机构
[1] Fac Polytech Mons, TCTS, B-7000 Mons, Belgium
[2] Babel Technol SA, B-7000 Mons, Belgium
关键词
speech segmentation; hidden Markov models; hybrid HMM/ANN systems; speech synthesis; large speech corpora;
D O I
10.1016/S0167-6393(02)00131-0
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we compare two different methods for automatically phonetically labeling a continuous speech database, as usually required for designing a speech recognition or speech synthesis system. The first method is based on temporal alignment of speech on a synthetic speech pattern; the second method uses either a continuous density hidden Markov models (HMM) or a hybrid HMM/ANN (artificial neural network) system in forced alignment mode. Both systems have been evaluated on read utterances not part of the training set of the HMM systems, and compared to manual segmentation. This study outlines the advantages and drawbacks of both methods. The speech synthetic system has the great advantage that no training stage (hence no large labeled database) is needed, while HMM systems easily handle multiple phonetic transcriptions (phonetic lattice). We deduce a method for the automatic creation of large phonetically labeled speech databases, based on using the synthetic speech segmentation tool to bootstrap the training process of either a HMM or a hybrid HMM/ANN system. The importance of such segmentation tools is a key point for the development of improved multilingual speech synthesis and recognition systems. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:503 / 515
页数:13
相关论文
共 50 条
  • [1] Viterbi-Based Efficient Test Data Compression
    Lee, Dongsoo
    Roy, Kaushik
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2012, 31 (04) : 610 - 619
  • [2] SYNTHESIS-BASED RECOGNITION OF CONTINUOUS SPEECH
    PALIWAL, KK
    RAO, PVS
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1982, 71 (04): : 1016 - 1024
  • [3] Synthesis-based recognition of continuous speech
    Paliwal, K.K.
    Rao, P.V.S.
    Journal of the Acoustical Society of America, 1982, 71 (04): : 1016 - 1024
  • [4] Viterbi-Based Efficient Test Data Compression
    Lee, Dongsoo
    Roy, Kaushik
    2011 16TH IEEE EUROPEAN TEST SYMPOSIUM (ETS), 2011, : 204 - 204
  • [5] APPROACH TOWARDS A SYNTHESIS-BASED SPEECH RECOGNITION SYSTEM
    THOSAR, RB
    RAO, PVS
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (02): : 194 - 196
  • [6] Viterbi-based data association techniques for target tracking
    Gad, A
    Farooq, M
    SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XII, 2003, 5096 : 37 - 46
  • [7] Improving the accuracy of the speech synthesis based phonetic alignment using multiple acoustic features
    Paulo, S
    Oliveira, LC
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANAGUAGE, PROCEEDINGS, 2003, 2721 : 31 - 39
  • [8] Fast implementation methods for Viterbi-based word-spotting
    Knill, KM
    Young, SJ
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 522 - 525
  • [9] Reliability Aware Service Placement Using a Viterbi-Based Algorithm
    Karimzadeh-Farshbafan, Mohammad
    Shah-Mansouri, Vahid
    Niyato, Dusit
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2020, 17 (01): : 622 - 636
  • [10] Security Service Function Chain Deployment Using a Viterbi-Based Algorithm
    Zhai, Dong
    Meng, Xiangru
    Kang, Qiaoyan
    Hu, Hang
    Meng, Qingwei
    Liang, Yuan
    2021 13TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN 2021), 2021, : 55 - 61