Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points

被引:5
|
作者
Thirumuru, Ramakrishna [1 ]
Gangashetty, Suryakanth V. [1 ]
Vuppala, Anil Kumar [1 ]
机构
[1] Int Inst Informat Technol Hyderabad, Language Technol Res Ctr, Hyderabad, Andhra Pradesh, India
关键词
Vowel onset point (VOP); Vowel end-point (VEP); Zero frequency filtering; Magnitude spectrum; Epoch intervals; Strength of the excitation; EXCITATION; SIGNALS;
D O I
10.1007/s11042-017-5044-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Vowels are produced with an open configuration of the vocal tract, without any audible friction. The acoustic signal is relatively loud with varying strength of impulse-like excitation. Vowels possess significant energy content in the low-frequency bands of the speech signal. Acoustic events such as vowel onset point (VOP) and vowel end-point (VEP) can be used as landmarks to detect vowel regions in a speech signal. In this paper, a two-stage algorithm is proposed to detect precise vowel regions. In the first level, the speech signal is processed using zero frequency filtering to emphasize energy content in low-frequency bands of speech. Zero frequency filtered signal predominantly contains low-frequency content of the speech signal as it is filtered around 0 Hz. This process is followed by the extraction of dominant spectral peaks from the magnitude spectrum around glottal closure regions of the speech signal. The vowel onset points and vowel end-points are obtained by convolving the enhanced spectral contour of zero frequency filtered signal with first order Gaussian differentiator. In the next level, a post-processing is carried out in the regions around VOP and VEP to remove spurious vowel regions based on uniformity of epoch intervals. In addition, the positions of VOPs and VEPs are also corrected using the strength of the excitation of the speech signal. The performance of the proposed vowel region detection method is compared with the existing state of art methods on TIMIT acoustic-phonetic speech corpus. It is reported that this method produced significant improvement in vowel region detection in clean and noisy environments.
引用
收藏
页码:4753 / 4767
页数:15
相关论文
共 50 条
  • [41] Automatic assignment of anchoring points on vowel templates for defining correspondence between time-frequency representations of speech samples
    Takahashi, Toru
    Nishi, Masashi
    Irino, Toshio
    Kawahara, Hideki
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2514 - 2517
  • [42] Classification of Cold and Non-Cold Speech Using Vowel-Like Region Segments
    Warule, Pankaj
    Mishra, Siba Prasad
    Deb, Suman
    2022 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM, 2022,
  • [43] Consonant-vowel unit recognition using dominant aperiodic and transition region detection
    Sarma, Biswajit D.
    Prasanna, S. R. Mahadeva
    Sarmah, Priyankoo
    SPEECH COMMUNICATION, 2017, 92 : 77 - 89
  • [44] Advances in Completely Automated Vowel Analysis for Sociophonetics: Using End-to-End Speech Recognition Systems With DARLA
    Coto-Solano, Rolando
    Stanford, James N.
    Reddy, Sravana K.
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
  • [45] Emotion recognition from spontaneous speech using emotional vowel-like regions
    Fahad, Md Shah
    Singh, Shreya
    Abhinav
    Ranjan, Ashish
    Deepak, Akshay
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (10) : 14025 - 14043
  • [46] Emotion recognition from spontaneous speech using emotional vowel-like regions
    Md Shah Fahad
    Shreya Singh
    Ashish Abhinav
    Akshay Ranjan
    Multimedia Tools and Applications, 2022, 81 : 14025 - 14043
  • [47] The differential effects of vowel and onset consonant lengthening on speech segmentation: Evidence from Taiwanese Southern Min
    Ou, Shu-chen
    Guo, Zhe-chen
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2021, 149 (03): : 1866 - 1877
  • [48] Recognition of Stop-Consonant-Vowel (SCV) segments in continuous speech using neural network models
    Sekhar, CC
    Yegnanarayana, B
    JOURNAL OF THE INSTITUTION OF ELECTRONICS AND TELECOMMUNICATION ENGINEERS, 1996, 42 (4-5): : 269 - 280
  • [49] A facial expression recognition for a speaker of a phoneme of vowel using thermal image processing and a speech recognition system
    Koda, Y.
    Yoshitomi, Y.
    Nakano, M.
    Tabuse, M.
    RO-MAN 2009: THE 18TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1 AND 2, 2009, : 192 - +
  • [50] Spotting consonant-vowel units in continuous speech using autoassociative neural networks and support vector machines
    Gangashetty, SV
    Sekhar, CC
    Yegnanarayana, B
    MACHINE LEARNING FOR SIGNAL PROCESSING XIV, 2004, : 401 - 410