Hierarchical Phoneme Classification for Improved Speech Recognition

被引:10
|
作者
Oh, Donghoon [1 ,2 ]
Park, Jeong-Sik [3 ]
Kim, Ji-Hwan [4 ]
Jang, Gil-Jin [2 ,5 ]
机构
[1] SK Holdings C&C, Gyeonggi Do 13558, South Korea
[2] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea
[3] Hankuk Univ Foreign Studies, Dept English Linguist & Language Technol, Seoul 02450, South Korea
[4] Sogang Univ, Dept Comp Sci & Engn, Seoul 04107, South Korea
[5] Kyungpook Natl Univ, Sch Elect Engn, Daegu 41566, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 01期
基金
新加坡国家研究基金会;
关键词
speech recognition; phoneme classification; clustering; recurrent neural networks; NEURAL-NETWORKS; CONSONANTS;
D O I
10.3390/app11010428
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Featured Application Automatic speech recognition; chatbot; voice-assisted control; multimodal man-machine interaction systems. Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [1] Speech recognition through phoneme segmentation and neural classification
    Maeran, O
    Piuri, V
    Gajani, GS
    IMTC/97 - IEEE INSTRUMENTATION & MEASUREMENT TECHNOLOGY CONFERENCE: SENSING, PROCESSING, NETWORKING, PROCEEDINGS VOLS 1 AND 2, 1997, : 1215 - 1220
  • [2] Analysis of Hierarchical Bottleneck Framework for Improved Phoneme Recognition
    Zaki, Mohammadi
    Sailor, Hardik B.
    Patil, Hemant A.
    2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2016,
  • [3] Improved Phoneme-Based Myoelectric Speech Recognition
    Zhou, Quan
    Jiang, Ning
    Englehart, Kevin
    Hudgins, Bernard
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2009, 56 (08) : 2016 - 2023
  • [4] Bidirectional LSTM networks for improved phoneme classification and recognition
    Graves, A
    Fernández, S
    Schmidhuber, J
    ARTIFICIAL NEURAL NETWORKS: FORMAL MODELS AND THEIR APPLICATIONS - ICANN 2005, PT 2, PROCEEDINGS, 2005, 3697 : 799 - 804
  • [5] Diagnostics of speech recognition using classification phoneme diagnostic trees
    Cernak, Milos
    Wellekens, Christian
    PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 459 - +
  • [6] Myoclectric signal classification for phoneme-based speech recognition
    Scheme, Erik J.
    Hudgins, Bernard
    Parker, Phillip A.
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2007, 54 (04) : 694 - 699
  • [7] PHONEME GROUPING FOR SPEECH RECOGNITION
    REDDY, DR
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1967, 41 (05): : 1295 - &
  • [8] Hierarchical Phoneme Classifier for Hindi Speech
    Singhvi, Abhinav
    Gupta, Prashant
    Sanyal, Sudip
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 571 - 574
  • [9] Classification of myoelectric signal for sub-vocal Hindi phoneme speech recognition
    Khan, Munna
    Jahan, Mosarrat
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 35 (05) : 5585 - 5592
  • [10] Non-linear speech feature extraction for phoneme classification and speaker recognition
    Chetouani, M
    Faundez-Zanuy, M
    Gas, B
    Zarader, JL
    NONLINEAR SPEECH MODELING AND APPLICATIONS, 2005, 3445 : 344 - 350