A Hybrid Text-to-Speech Synthesis using Vowel and Non Vowel like regions

被引:0
|
作者
Adiga, Nagaraj [1 ]
Prasanna, S. R. Mahadeva [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati, India
关键词
speech synthesis; unit selection; hybrid TTS; HTS; VLRs and NVLRs; EPOCH EXTRACTION; SELECTION; SYSTEM;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents a hybrid Text-to-Speech synthesis (TTS) approach by combining advantages present in both Hidden Markov model speech synthesis (HTS) and Unit selection speech synthesis (USS). In hybrid TTS, speech sound units are classified into vowel like regions (VLRs) and non vowel like regions (NVLRs) for selecting the units. The VLRs here refers to vowel, diphthong, semivowel and nasal sound units [1], which can be better modeled from HMM framework and hence waveforms units are chosen from HTS. Remaining sound units such as stop consonants, fricatives and affricates, which are not modeled properly using HMM [2] are classified as NVLRs and for these phonetic classes natural sound units are picked from USS. The VLRs and NVLRs evidence obtained from manual and automatic segmentation of speech signal. The automatic detection is done by fusing source features obtained from Hilbert envelope (HE) and Zero frequency filter (ZFF) of speech signal. Speech synthesized from manual and automated hybrid TTS method is compared with HTS and USS voice using subjective and objective measures. Results show that synthesis quality of hybrid TTS in case of manual segmentation is better compared to HTS voice, whereas automatic segmentation has slightly inferior quality.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points
    Thirumuru, Ramakrishna
    Gangashetty, Suryakanth V.
    Vuppala, Anil Kumar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (04) : 4753 - 4767
  • [42] Exploration of Vowel Onset and Offset Points for Hybrid Speech Segmentation
    Sarma, Biswajit Dev
    Sharma, Bidisha
    Shanmugam, S. Aswin
    Prasanna, S. R. Mahadeva
    Murthy, Hema A.
    TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE, 2015,
  • [43] Synthesis of emotional speech by prosody modification of vowel segments of neutral speech
    Fahad M.S.
    Singh S.
    Gupta S.
    Deepak A.
    Abhinav
    Recent Advances in Computer Science and Communications, 2021, 14 (04) : 1226 - 1235
  • [44] Gemination prediction using DNN for Arabic text-to-speech synthesis
    Ali, Ikbel Hadj
    Mnasri, Zied
    Laachri, Zied
    2019 16TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2019, : 366 - 370
  • [45] Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration
    Yeshpanov, Rustem
    Mussakhojayeva, Saida
    Khassanov, Yerbolat
    arXiv, 2023,
  • [46] Lombard Speech Synthesis using Transfer Learning in a Tacotron Text-to-Speech System
    Bollepalli, Bajibabu
    Juvela, Lauri
    Alku, Paavo
    INTERSPEECH 2019, 2019, : 2833 - 2837
  • [47] Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration
    Yeshpanov, Rustem
    Mussakhojayeva, Saida
    Khassanov, Yerbolat
    INTERSPEECH 2023, 2023, : 5521 - 5525
  • [48] CLUSTERING OF DURATION PATTERNS IN SPEECH FOR TEXT-TO-SPEECH SYNTHESIS
    Sreelekshmi, K. S.
    Gopinath, Deepa P.
    2012 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2012, : 1122 - 1127
  • [49] Symbol based concatenation approach for Text to Speech System for Hindi using vowel classification technique
    Chaudhury, Pamela
    Rao, Madhuri
    Kumar, KVinod
    2009 WORLD CONGRESS ON NATURE & BIOLOGICALLY INSPIRED COMPUTING (NABIC 2009), 2009, : 1081 - +
  • [50] Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion
    Paul, Dipjyoti
    Shifas, Muhammed P., V
    Pantazis, Yannis
    Stylianou, Yannis
    INTERSPEECH 2020, 2020, : 1361 - 1365