Hierarchical Phoneme Classification for Improved Speech Recognition

被引：10

作者：

Oh, Donghoon ^{[1
,2
]}

Park, Jeong-Sik ^{[3
]}

Kim, Ji-Hwan ^{[4
]}

Jang, Gil-Jin ^{[2
,5
]}

机构：

[1] SK Holdings C&C, Gyeonggi Do 13558, South Korea

[2] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea

[3] Hankuk Univ Foreign Studies, Dept English Linguist & Language Technol, Seoul 02450, South Korea

[4] Sogang Univ, Dept Comp Sci & Engn, Seoul 04107, South Korea

[5] Kyungpook Natl Univ, Sch Elect Engn, Daegu 41566, South Korea

来源：

APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 01期

基金：

新加坡国家研究基金会;

关键词：

speech recognition; phoneme classification; clustering; recurrent neural networks; NEURAL-NETWORKS; CONSONANTS;

D O I：

10.3390/app11010428

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Featured Application Automatic speech recognition; chatbot; voice-assisted control; multimodal man-machine interaction systems. Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.

引用

页码：1 / 17

页数：17

共 50 条

[31] HIERARCHICAL CLASSIFICATION TREE MODELING OF NONSTATIONARY NOISE FOR ROBUST SPEECH RECOGNITION
Zelinka, Petr
Sigmund, Milan
INFORMATION TECHNOLOGY AND CONTROL, 2010, 39 (03): : 202 - 210
[32] A Novel Hierarchical Speech Emotion Recognition Method Based on Improved DDAGSVM
Mao, Qi-rong
Zhan, Yong-zhao
COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2010, 7 (01) : 211 - 221
[33] Improved speech recognition via speaker stress directed classification
Womack, BD
Hansen, JHL
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 53 - 56
[34] Phoneme Set Design for Speech Recognition of English by Japanese
Wang, Xiaoyun
Zhang, Jinsong
Nishida, Masafumi
Yamamoto, Seiichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (01): : 148 - 156
[35] Phoneme and Sentence-Level Ensembles for Speech Recognition
Dimitrakakis, Christos
Bengio, Samy
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011,
[36] Robust Phoneme Recognition Based on Biomimetic Speech Contours
Carlin, Michael A.
Patil, Kailash
Nemala, Sridhar Krishna
Elhilali, Mounya
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1346 - 1349
[37] Neural networks for text-to-speech phoneme recognition
Embrechts, MJ
Arciniegas, F
SMC 2000 CONFERENCE PROCEEDINGS: 2000 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOL 1-5, 2000, : 3582 - 3587
[38] Exploiting contextual information for improved phoneme recognition
Pinto, Joel
Yegnanarayana, A.
Hermansky, H.
Magimai-Doss, Mathew
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4449 - +
[39] IMPLEMENTATION OF HIERARCHICAL PHONEME CLASSIFICATION APPROACH ON LTDIGITS CORPORA
Driaunys, Kestutis
Rudzionis, Vytautas
Zvinys, Pranas
INFORMATION TECHNOLOGY AND CONTROL, 2009, 38 (04): : 303 - 310
[40] Selected phoneme rejection grammar for a speech recognition system
Shu, CQ
ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 646 - 649

← 1 2 3 4 5 →