Hierarchical Phoneme Classification for Improved Speech Recognition

被引：10

作者：

Oh, Donghoon ^{[1
,2
]}

Park, Jeong-Sik ^{[3
]}

Kim, Ji-Hwan ^{[4
]}

Jang, Gil-Jin ^{[2
,5
]}

机构：

[1] SK Holdings C&C, Gyeonggi Do 13558, South Korea

[2] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea

[3] Hankuk Univ Foreign Studies, Dept English Linguist & Language Technol, Seoul 02450, South Korea

[4] Sogang Univ, Dept Comp Sci & Engn, Seoul 04107, South Korea

[5] Kyungpook Natl Univ, Sch Elect Engn, Daegu 41566, South Korea

来源：

APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 01期

基金：

新加坡国家研究基金会;

关键词：

speech recognition; phoneme classification; clustering; recurrent neural networks; NEURAL-NETWORKS; CONSONANTS;

D O I：

10.3390/app11010428

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Featured Application Automatic speech recognition; chatbot; voice-assisted control; multimodal man-machine interaction systems. Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.

引用

页码：1 / 17

页数：17

共 50 条

[1] Speech recognition through phoneme segmentation and neural classification
Maeran, O
Piuri, V
Gajani, GS
IMTC/97 - IEEE INSTRUMENTATION & MEASUREMENT TECHNOLOGY CONFERENCE: SENSING, PROCESSING, NETWORKING, PROCEEDINGS VOLS 1 AND 2, 1997, : 1215 - 1220
[2] Analysis of Hierarchical Bottleneck Framework for Improved Phoneme Recognition
Zaki, Mohammadi
Sailor, Hardik B.
Patil, Hemant A.
2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2016,
[3] Improved Phoneme-Based Myoelectric Speech Recognition
Zhou, Quan
Jiang, Ning
Englehart, Kevin
Hudgins, Bernard
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2009, 56 (08) : 2016 - 2023
[4] Bidirectional LSTM networks for improved phoneme classification and recognition
Graves, A
Fernández, S
Schmidhuber, J
ARTIFICIAL NEURAL NETWORKS: FORMAL MODELS AND THEIR APPLICATIONS - ICANN 2005, PT 2, PROCEEDINGS, 2005, 3697 : 799 - 804
[5] Diagnostics of speech recognition using classification phoneme diagnostic trees
Cernak, Milos
Wellekens, Christian
PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 459 - +
[6] Myoclectric signal classification for phoneme-based speech recognition
Scheme, Erik J.
Hudgins, Bernard
Parker, Phillip A.
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2007, 54 (04) : 694 - 699
[7] PHONEME GROUPING FOR SPEECH RECOGNITION
REDDY, DR
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1967, 41 (05): : 1295 - &
[8] Hierarchical Phoneme Classifier for Hindi Speech
Singhvi, Abhinav
Gupta, Prashant
Sanyal, Sudip
ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 571 - 574
[9] Classification of myoelectric signal for sub-vocal Hindi phoneme speech recognition
Khan, Munna
Jahan, Mosarrat
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 35 (05) : 5585 - 5592
[10] Non-linear speech feature extraction for phoneme classification and speaker recognition
Chetouani, M
Faundez-Zanuy, M
Gas, B
Zarader, JL
NONLINEAR SPEECH MODELING AND APPLICATIONS, 2005, 3445 : 344 - 350

← 1 2 3 4 5 →