A Hybrid Text-to-Speech Synthesis using Vowel and Non Vowel like regions

被引:0
|
作者
Adiga, Nagaraj [1 ]
Prasanna, S. R. Mahadeva [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati, India
关键词
speech synthesis; unit selection; hybrid TTS; HTS; VLRs and NVLRs; EPOCH EXTRACTION; SELECTION; SYSTEM;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents a hybrid Text-to-Speech synthesis (TTS) approach by combining advantages present in both Hidden Markov model speech synthesis (HTS) and Unit selection speech synthesis (USS). In hybrid TTS, speech sound units are classified into vowel like regions (VLRs) and non vowel like regions (NVLRs) for selecting the units. The VLRs here refers to vowel, diphthong, semivowel and nasal sound units [1], which can be better modeled from HMM framework and hence waveforms units are chosen from HTS. Remaining sound units such as stop consonants, fricatives and affricates, which are not modeled properly using HMM [2] are classified as NVLRs and for these phonetic classes natural sound units are picked from USS. The VLRs and NVLRs evidence obtained from manual and automatic segmentation of speech signal. The automatic detection is done by fusing source features obtained from Hilbert envelope (HE) and Zero frequency filter (ZFF) of speech signal. Speech synthesized from manual and automated hybrid TTS method is compared with HTS and USS voice using subjective and objective measures. Results show that synthesis quality of hybrid TTS in case of manual segmentation is better compared to HTS voice, whereas automatic segmentation has slightly inferior quality.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] SIGNIFICANCE OF VOWEL EPENTHESIS IN TELUGU TEXT-TO-SPEECH SYNTHESIS
    Peddinti, Vijayaditya
    Prahallad, Kishore
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5348 - 5351
  • [2] Emotion Classification Using Segmentation of Vowel-Like and Non-Vowel-Like Regions
    Deb, Suman
    Dandapat, Samarendra
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (03) : 360 - 373
  • [3] SVM based Speaker Verification system using Vowel Like and Non-Vowel Like Regions
    Ramya, R.
    Priya, P. Shanmuga
    2014 INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND SIGNAL PROCESSING (ICCSP), 2014,
  • [4] Visualising Model Training via Vowel Space for Text-To-Speech Systems
    Abeysinghe, Binu
    James, Jesin
    Watson, Catherine I.
    Marattukalam, Felix
    INTERSPEECH 2022, 2022, : 511 - 515
  • [5] APPROACH TO SEGMENTING SPEECH INTO VOWEL-LIKE AND NON-VOWEL-LIKE INTERVALS
    KASUYA, H
    WAKITA, H
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (04): : 319 - 327
  • [6] Partial compensation for coarticulatory vowel nasalization across concatenative and neural text-to-speech
    Zellou, Georgia
    Cohn, Michelle
    Block, Aleese
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2021, 149 (05): : 3424 - 3436
  • [7] A hybrid model for text-to-speech synthesis
    Violaro, F
    Boeffard, O
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (05): : 426 - 434
  • [8] Speech Synthesis of Emotions Using Vowel Features
    Boku, Kanu
    Asada, Taro
    Yoshitomi, Yasunari
    Tabuse, Masayoshi
    INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2013, 1 (01) : 54 - 67
  • [9] Emotion recognition from spontaneous speech using emotional vowel-like regions
    Fahad, Md Shah
    Singh, Shreya
    Abhinav
    Ranjan, Ashish
    Deepak, Akshay
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (10) : 14025 - 14043
  • [10] Emotion recognition from spontaneous speech using emotional vowel-like regions
    Md Shah Fahad
    Shreya Singh
    Ashish Abhinav
    Akshay Ranjan
    Multimedia Tools and Applications, 2022, 81 : 14025 - 14043