Acoustic speech unit segmentation for concatenative synthesis

被引:4
|
作者
Torres, H. M. [1 ]
Gurlekian, J. A. [1 ]
机构
[1] Hosp Clin Buenos Aires, Inst Neurociencias Aplicadas, Consejo Nacl Invest Cient & Tecn, Lab Invest Sensoriales, RA-1120 Buenos Aires, DF, Argentina
来源
COMPUTER SPEECH AND LANGUAGE | 2008年 / 22卷 / 02期
关键词
Text to speech; Unit segmentation; Corpus-driven; Polyphones;
D O I
10.1016/j.csl.2007.07.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Synthesis by concatenation of natural speech improves perceptual results when phonemes and syllables are segmented at places where spectral variations are small [Klatt, D., 1987. Review of text-to-speech conversion for English. J. Acoust. Soc. Am 82 (3), 737-793]. An automatic segmentation method is explored here, using a tool based on a combination of Entropy Coding, Multiresolution Analysis, and Kohonen's Self Organized Maps. The segmentation method considers that there are no limits imposed by any linguistic unit. Resulting waveforms represent phone chains dominated by spectral dynamic structures. Each acoustic unit obtained could be composed of a variety of phonemes or a segmented part of them at the unit boundary. The number of units and unit structure are speaker dependent, i.e. rate, segmental and suprasegmental distinctive features affect them as dynamic structure varies. Results obtained from two databases - one male, one female - of 741 sentences each show this dependence, presenting a different number of units and occurrences for each speaker. Nevertheless, both speakers show a high occurrence of three (36-24%) and four (29-27%) phoneme sequences. Vowel-consonant-vowel sequences are the most frequent type (9.7-8.3%). Consonant-vowel syllables, which are phonemically frequent in Spanish (58%), are less represented (6.6-3.2%) using this method. The relevance of half phone segmentation is verified given that 66% for the female speaker and 53% for the male speaker, of the total units start and end with a segmented phone. Perceptual experiments showed that concatenated speech, created with dynamic acoustic units, was judged more natural than with diphone units. (C) 2007 Elsevier Ltd. All rights reserved.
引用
收藏
页码:196 / 206
页数:11
相关论文
共 50 条
  • [41] Removing linear phase mismatches in concatenative speech synthesis
    Stylianou, Y
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03): : 232 - 239
  • [42] Statistical prediction of spectral discontinuities of speech in concatenative synthesis
    Pablo Trivino, Manuel
    Alias, Francesc
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (40): : 67 - 74
  • [43] Subspace and hypothesis based effective segmentation of co-articulated basic-units for concatenative speech synthesis
    Muralishankar, R
    Srikanth, R
    Ramakrishnan, AG
    IEEE TENCON 2003: CONFERENCE ON CONVERGENT TECHNOLOGIES FOR THE ASIA-PACIFIC REGION, VOLS 1-4, 2003, : 388 - 392
  • [44] Introduction to Multilingual Corpus-Based Concatenative Speech Synthesis
    Deprez, Filip
    Odijk, Jan
    De Moortel, Jan
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 357 - 360
  • [45] Syllable-Based Concatenative Speech Synthesis for Marathi Language
    Ghate, Pravin M.
    Shirbahadurkar, Suresh D.
    INFORMATION AND COMMUNICATION TECHNOLOGY FOR COMPETITIVE STRATEGIES, 2019, 40 : 615 - 624
  • [46] Affective word ratings for concatenative text-to-speech synthesis
    Tsiakoulis, Pirros
    Raptis, Spiros
    Karabetsos, Sotiris
    Chalamandaris, Aimilios
    20TH PAN-HELLENIC CONFERENCE ON INFORMATICS (PCI 2016), 2016,
  • [47] Feedback Loop for Prosody Prediction in Concatenative Speech Synthesis.
    Latorre, Javier
    Gracia, Sergio
    Akamine, Masami
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2027 - 2030
  • [48] Diphone-based concatenative speech synthesis system for Mongolian
    Davaatsagaan, Munkhtuya
    Paliwal, Kuldip K.
    IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 276 - 279
  • [49] Applying the harmonic plus noise model in concatenative speech synthesis
    Stylianou, Y
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (01): : 21 - 29
  • [50] Feature Extraction for Spectral Continuity Measures in Concatenative Speech Synthesis
    Kirkpatrick, Barry
    O'Brien, Darragh
    Scaife, Ronan
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1742 - 1745