Hierarchical Phoneme Classification for Improved Speech Recognition

被引:10
|
作者
Oh, Donghoon [1 ,2 ]
Park, Jeong-Sik [3 ]
Kim, Ji-Hwan [4 ]
Jang, Gil-Jin [2 ,5 ]
机构
[1] SK Holdings C&C, Gyeonggi Do 13558, South Korea
[2] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea
[3] Hankuk Univ Foreign Studies, Dept English Linguist & Language Technol, Seoul 02450, South Korea
[4] Sogang Univ, Dept Comp Sci & Engn, Seoul 04107, South Korea
[5] Kyungpook Natl Univ, Sch Elect Engn, Daegu 41566, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 01期
基金
新加坡国家研究基金会;
关键词
speech recognition; phoneme classification; clustering; recurrent neural networks; NEURAL-NETWORKS; CONSONANTS;
D O I
10.3390/app11010428
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Featured Application Automatic speech recognition; chatbot; voice-assisted control; multimodal man-machine interaction systems. Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [31] HIERARCHICAL CLASSIFICATION TREE MODELING OF NONSTATIONARY NOISE FOR ROBUST SPEECH RECOGNITION
    Zelinka, Petr
    Sigmund, Milan
    INFORMATION TECHNOLOGY AND CONTROL, 2010, 39 (03): : 202 - 210
  • [32] A Novel Hierarchical Speech Emotion Recognition Method Based on Improved DDAGSVM
    Mao, Qi-rong
    Zhan, Yong-zhao
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2010, 7 (01) : 211 - 221
  • [33] Improved speech recognition via speaker stress directed classification
    Womack, BD
    Hansen, JHL
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 53 - 56
  • [34] Phoneme Set Design for Speech Recognition of English by Japanese
    Wang, Xiaoyun
    Zhang, Jinsong
    Nishida, Masafumi
    Yamamoto, Seiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (01): : 148 - 156
  • [35] Phoneme and Sentence-Level Ensembles for Speech Recognition
    Dimitrakakis, Christos
    Bengio, Samy
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011,
  • [36] Robust Phoneme Recognition Based on Biomimetic Speech Contours
    Carlin, Michael A.
    Patil, Kailash
    Nemala, Sridhar Krishna
    Elhilali, Mounya
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1346 - 1349
  • [37] Neural networks for text-to-speech phoneme recognition
    Embrechts, MJ
    Arciniegas, F
    SMC 2000 CONFERENCE PROCEEDINGS: 2000 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOL 1-5, 2000, : 3582 - 3587
  • [38] Exploiting contextual information for improved phoneme recognition
    Pinto, Joel
    Yegnanarayana, A.
    Hermansky, H.
    Magimai-Doss, Mathew
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4449 - +
  • [39] IMPLEMENTATION OF HIERARCHICAL PHONEME CLASSIFICATION APPROACH ON LTDIGITS CORPORA
    Driaunys, Kestutis
    Rudzionis, Vytautas
    Zvinys, Pranas
    INFORMATION TECHNOLOGY AND CONTROL, 2009, 38 (04): : 303 - 310
  • [40] Selected phoneme rejection grammar for a speech recognition system
    Shu, CQ
    ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 646 - 649