Hierarchical Phoneme Classification for Improved Speech Recognition

被引:10
|
作者
Oh, Donghoon [1 ,2 ]
Park, Jeong-Sik [3 ]
Kim, Ji-Hwan [4 ]
Jang, Gil-Jin [2 ,5 ]
机构
[1] SK Holdings C&C, Gyeonggi Do 13558, South Korea
[2] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea
[3] Hankuk Univ Foreign Studies, Dept English Linguist & Language Technol, Seoul 02450, South Korea
[4] Sogang Univ, Dept Comp Sci & Engn, Seoul 04107, South Korea
[5] Kyungpook Natl Univ, Sch Elect Engn, Daegu 41566, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 01期
基金
新加坡国家研究基金会;
关键词
speech recognition; phoneme classification; clustering; recurrent neural networks; NEURAL-NETWORKS; CONSONANTS;
D O I
10.3390/app11010428
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Featured Application Automatic speech recognition; chatbot; voice-assisted control; multimodal man-machine interaction systems. Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [21] PHONEME SELECTION FOR STUDIES IN AUTOMATIC SPEECH RECOGNITION
    SHOUP, JE
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1962, 34 (04): : 397 - &
  • [22] Phoneme fuzzy characterization in speech recognition systems
    Beritelli, F
    Borrometi, L
    Cuce, A
    APPLICATIONS OF SOFT COMPUTING, 1997, 3165 : 305 - 306
  • [23] Phoneme Confusions in Human and Automatic Speech Recognition
    Meyer, Bernd T.
    Waechter, Matthias
    Brand, Thomas
    Kollmeier, Birger
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2740 - 2743
  • [24] Improving phoneme recognition of telephone quality speech
    Huang, Q
    Cox, S
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 445 - 448
  • [25] Phoneme and tonal accent recognition for Thai speech
    Theera-Umpon, Nipon
    Chansareewittaya, Suppakarn
    Auephanwiriyakul, Sansanee
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) : 13254 - 13259
  • [26] A Comprehensive Examination of Phoneme Recognition in Automatic Speech Recognition Systems
    Bhatt, Shobha
    Bansal, Shweta
    Kumar, Ankit
    Pandey, Saroj Kumar
    Ojha, Manoj Kumar
    Singh, Kamred Udham
    Chakraborty, Sanjay
    Singh, Teekam
    Swarup, Chetan
    TRAITEMENT DU SIGNAL, 2023, 40 (05) : 1997 - 2008
  • [27] Mouth Shape Sequence Recognition Based on Speech Phoneme Recognition
    Xu, Ming
    Hu, Ruimin
    2006 FIRST INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND NETWORKING IN CHINA, 2006,
  • [28] Robust phoneme classification for automatic speech recognition using hybrid features and an amalgamated learning model
    Khwaja, Mohammed Kamal
    Vikash, Peddakota
    Arulmozhivarman, P.
    Lui, Simon
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 895 - 905
  • [29] Speech/Non-Speech Segmentation Based on Phoneme Recognition Features
    Janez Žibert
    Nikola Pavešić
    France Mihelič
    EURASIP Journal on Advances in Signal Processing, 2006
  • [30] Speech/non-speech segmentation based on phoneme recognition features
    Zibert, Janez
    Pavesic, Nikola
    Mihelic, France
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2006, 2006 (1)