Selecting non-uniform units from a very large corpus for concatenative speech synthesizer

被引:0
|
作者
Chu, M [1 ]
Peng, H [1 ]
Yang, HY [1 ]
Chang, E [1 ]
机构
[1] Microsoft Res China, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a two-module TTS structure, which bypasses the prosody model that predicts numerical prosodic parameters for synthetic speech. Instead, many instances of each basic unit from a large speech corpus are classified into categories by a CART, in which the expectation of the weighted sum of square regression error of prosodic features is used as splitting criterion. Better prosody is achieved by keeping slender diversity in prosodic features of instances belong to the same class. A multi-tier non-uniform unit selection method is presented. It makes the best decision on unit selection by minimizing the concatenated cost of a whole utterance. Since the largest available and suitable units are selected for concatenating, distortion caused by mismatches at concatenated points is minimized. Very natural and fluent speech is synthesized, according to informal listening test.
引用
收藏
页码:785 / 788
页数:4
相关论文
共 50 条
  • [1] Selecting optimal non-uniform units for hierarchical unit selection
    Xu, Jun
    Huang, Dezhi
    Dong, Yuan
    Cai, Lianhong
    Wang, Haila
    2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2008, : 1610 - 1614
  • [2] A Chinese text-to-speech system based on part-of-speech analysis, prosodic modeling and non-uniform units
    Chou, FC
    Tseng, CY
    Chen, KJ
    Lee, LS
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS, 1997, : 923 - 926
  • [3] Wavelet transforms for non-uniform speech recognition systems
    Janer, L
    Marti, J
    Nadeu, C
    LleidaSolano, E
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2348 - 2351
  • [4] Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals
    Wang, Jinhan
    Ravi, Vijay
    Alwan, Abeer
    INTERSPEECH 2023, 2023, : 2343 - 2347
  • [5] Reconstruction of images with large non-uniform increments
    Melnyk S.I.
    Melnyk S.S.
    Melnyk, S.I. (smelnyk@yandex.ru), 2016, Begell House Inc. (75): : 719 - 732
  • [6] LARGE DEVIATIONS FOR SYSTEMS WITH NON-UNIFORM STRUCTURE
    Climenhaga, Vaughn
    Thompson, Daniel J.
    Yamamoto, Kenichiro
    TRANSACTIONS OF THE AMERICAN MATHEMATICAL SOCIETY, 2017, 369 (06) : 4167 - 4192
  • [7] Pitch distributions in a very large corpus of spontaneous Finnish speech
    Lennes, Mietta
    Toivola, Minnaleena
    INTERSPEECH 2023, 2023, : 4778 - 4782
  • [8] Modelling Electromagnetic Scattering from Large Non-Uniform Planar Arrays
    Rashid, Aamir
    Tahir, F. A.
    2014 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM (APSURSI), 2014, : 2020 - 2021
  • [9] Voltage distribution effects of non-uniform units in suspension strings
    Ilhan, Suat
    Ozdemir, Aydogan
    2007 IEEE LAUSANNE POWERTECH, VOLS 1-5, 2007, : 801 - 806
  • [10] Improving quality of MBROLA synthesis for non-uniform units synthesis
    Bozkurt, B
    Dutoit, T
    Prudon, R
    D'Alessandro, C
    Pagel, V
    PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 7 - 10