Automatic generation of synthesis units for trainable text-to-speech systems

被引:0
|
作者
Hon, H [1 ]
Acero, A [1 ]
Huang, X [1 ]
Liu, J [1 ]
Plumpe, M [1 ]
机构
[1] Microsoft Corp, Res, Redmond, WA 98052 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Whistler Text-to-Speech engine was designed so that we can automatically construct the model parameters from training data. This paper will describe in detail the design issues of constructing the synthesis unit inventory automatically from speech databases. The automatic process includes (1) determining the scaleable synthesis unit which can reflect spectral variations of different allophones; (2) segmenting the recording sentences into phonetic segments; (3) select good instances for each synthesis unit to generate best synthesis sentence during run time. These processes are all derived through the use of probabilistic learning methods which are aimed at the same optimization criteria. Through this automatic unit generation, Whistler can automatically produce synthetic speech that sounds very natural and resembles the acoustic characteristics of the original speaker.
引用
收藏
页码:293 / 296
页数:4
相关论文
共 50 条
  • [1] Whistler: A trainable text-to-speech system
    Huang, XD
    Acero, A
    Adcock, J
    Hon, HW
    Goldsmith, J
    Liu, JS
    Plumpe, M
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2387 - 2390
  • [2] Automatic Syllabification for Danish Text-to-Speech Systems
    Beck, Jeppe
    Braga, Daniela
    Nogueira, Joao
    Dias, Miguel Sales
    Coelho, Luis
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1291 - 1294
  • [3] AUTOMATIC PROSODY GENERATION IN A TEXT-TO-SPEECH SYSTEM FOR HEBREW
    Popovic, Branislav
    Knezevic, Dragan
    Secujski, Milan
    Pekar, Darko
    FACTA UNIVERSITATIS-SERIES ELECTRONICS AND ENERGETICS, 2014, 27 (03) : 467 - 477
  • [4] Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
    Ni, Junrui
    Wang, Liming
    Gao, Heting
    Qian, Kaizhi
    Zhang, Yang
    Chang, Shiyu
    Hasegawa-Johnson, Mark
    INTERSPEECH 2022, 2022, : 461 - 465
  • [5] Paraphrase generation to improve Text-To-Speech Synthesis
    Putois, Ghislain
    Chevelu, Jonathan
    Boidin, Cedric
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 198 - 201
  • [6] Automatic Pitch Accent Prediction for Text-To-Speech Synthesis
    Read, Ian
    Cox, Stephen
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2085 - 2088
  • [7] Multi-domain Text-to-Speech Synthesis by Automatic Text Classification
    Alias, Francesc
    Socoro, Joan Claudi
    Sevillano, Xavier
    Iriondo, Ignasi
    Gonzalvo, Xavier
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1304 - 1307
  • [8] Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems
    Vich, Robert
    Nouza, Jan
    Vondra, Martin
    VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 136 - +
  • [9] TEXT-TO-SPEECH SYNTHESIS
    SPROAT, RW
    OLIVE, JP
    AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44
  • [10] Trainable prosodic model for standard Chinese Text-to-Speech system
    TAO Jianhua
    ChineseJournalofAcoustics, 2001, (03) : 257 - 265