A Hybrid Text-to-Speech Synthesis using Vowel and Non Vowel like regions

被引:0
|
作者
Adiga, Nagaraj [1 ]
Prasanna, S. R. Mahadeva [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati, India
关键词
speech synthesis; unit selection; hybrid TTS; HTS; VLRs and NVLRs; EPOCH EXTRACTION; SELECTION; SYSTEM;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents a hybrid Text-to-Speech synthesis (TTS) approach by combining advantages present in both Hidden Markov model speech synthesis (HTS) and Unit selection speech synthesis (USS). In hybrid TTS, speech sound units are classified into vowel like regions (VLRs) and non vowel like regions (NVLRs) for selecting the units. The VLRs here refers to vowel, diphthong, semivowel and nasal sound units [1], which can be better modeled from HMM framework and hence waveforms units are chosen from HTS. Remaining sound units such as stop consonants, fricatives and affricates, which are not modeled properly using HMM [2] are classified as NVLRs and for these phonetic classes natural sound units are picked from USS. The VLRs and NVLRs evidence obtained from manual and automatic segmentation of speech signal. The automatic detection is done by fusing source features obtained from Hilbert envelope (HE) and Zero frequency filter (ZFF) of speech signal. Speech synthesized from manual and automated hybrid TTS method is compared with HTS and USS voice using subjective and objective measures. Results show that synthesis quality of hybrid TTS in case of manual segmentation is better compared to HTS voice, whereas automatic segmentation has slightly inferior quality.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Multilingual text analysis for text-to-speech synthesis
    Sproat, R
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1365 - 1368
  • [32] Segmentation and Classification of Vowel Phonemes of Assamese Speech Using a Hybrid Neural Framework
    Sarma, Mousmita
    Sarma, Kandarpa Kumar
    APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2012, 2012
  • [33] Estonian Text-to-Speech Synthesis with Non-autoregressive Transformers
    Ratsep, Liisa
    Lellep, Rasmus
    Fishel, Mark
    BALTIC JOURNAL OF MODERN COMPUTING, 2022, 10 (03): : 447 - 456
  • [34] Vowel sound disambiguation for intelligible Korean speech synthesis
    Lee, Ho-Joon
    Park, Jong C.
    PACLIC 19: The 19th Pacific Asia Conference on Language, Information and Computation, 2005, : 131 - 142
  • [35] A Hybrid Text-to-Speech System That Combines Concatenative and Statistical Synthesis Units
    Tiomkin, Stas
    Malah, David
    Shechtman, Slava
    Kons, Zvi
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1278 - 1288
  • [36] Text-to-speech synthesis using spectral modeling based on non-negative autoencoder
    Gorai, Takeru
    Saito, Daisuke
    Minematsu, Nobuaki
    INTERSPEECH 2022, 2022, : 1621 - 1625
  • [37] Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points
    Ramakrishna Thirumuru
    Suryakanth V. Gangashetty
    Anil Kumar Vuppala
    Multimedia Tools and Applications, 2018, 77 : 4753 - 4767
  • [38] Environment Aware Text-to-Speech Synthesis
    Tan, Daxin
    Zhang, Guangyan
    Lee, Tan
    INTERSPEECH 2022, 2022, : 481 - 485
  • [39] Text-to-speech synthesis integrated circuit
    Baskaya, IF
    Aktan, O
    Dündar, G
    PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 653 - 656
  • [40] PHONETIC KNOWLEDGE IN TEXT-TO-SPEECH SYNTHESIS
    van Santen, Jan P. H.
    INTEGRATION OF PHONETIC KNOWLEDGE IN SPEECH TECHNOLOGY, 2005, 25 : 149 - 166