Hierarchical Phoneme Classification for Improved Speech Recognition

被引：10

作者：

Oh, Donghoon ^{[1
,2
]}

Park, Jeong-Sik ^{[3
]}

Kim, Ji-Hwan ^{[4
]}

Jang, Gil-Jin ^{[2
,5
]}

机构：

[1] SK Holdings C&C, Gyeonggi Do 13558, South Korea

[2] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea

[3] Hankuk Univ Foreign Studies, Dept English Linguist & Language Technol, Seoul 02450, South Korea

[4] Sogang Univ, Dept Comp Sci & Engn, Seoul 04107, South Korea

[5] Kyungpook Natl Univ, Sch Elect Engn, Daegu 41566, South Korea

来源：

APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 01期

基金：

新加坡国家研究基金会;

关键词：

speech recognition; phoneme classification; clustering; recurrent neural networks; NEURAL-NETWORKS; CONSONANTS;

D O I：

10.3390/app11010428

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Featured Application Automatic speech recognition; chatbot; voice-assisted control; multimodal man-machine interaction systems. Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.

引用

页码：1 / 17

页数：17

共 50 条

[21] PHONEME SELECTION FOR STUDIES IN AUTOMATIC SPEECH RECOGNITION
SHOUP, JE
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1962, 34 (04): : 397 - &
[22] Phoneme fuzzy characterization in speech recognition systems
Beritelli, F
Borrometi, L
Cuce, A
APPLICATIONS OF SOFT COMPUTING, 1997, 3165 : 305 - 306
[23] Phoneme Confusions in Human and Automatic Speech Recognition
Meyer, Bernd T.
Waechter, Matthias
Brand, Thomas
Kollmeier, Birger
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2740 - 2743
[24] Improving phoneme recognition of telephone quality speech
Huang, Q
Cox, S
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 445 - 448
[25] Phoneme and tonal accent recognition for Thai speech
Theera-Umpon, Nipon
Chansareewittaya, Suppakarn
Auephanwiriyakul, Sansanee
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) : 13254 - 13259
[26] A Comprehensive Examination of Phoneme Recognition in Automatic Speech Recognition Systems
Bhatt, Shobha
Bansal, Shweta
Kumar, Ankit
Pandey, Saroj Kumar
Ojha, Manoj Kumar
Singh, Kamred Udham
Chakraborty, Sanjay
Singh, Teekam
Swarup, Chetan
TRAITEMENT DU SIGNAL, 2023, 40 (05) : 1997 - 2008
[27] Mouth Shape Sequence Recognition Based on Speech Phoneme Recognition
Xu, Ming
Hu, Ruimin
2006 FIRST INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND NETWORKING IN CHINA, 2006,
[28] Robust phoneme classification for automatic speech recognition using hybrid features and an amalgamated learning model
Khwaja, Mohammed Kamal
Vikash, Peddakota
Arulmozhivarman, P.
Lui, Simon
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 895 - 905
[29] Speech/Non-Speech Segmentation Based on Phoneme Recognition Features
Janez Žibert
Nikola Pavešić
France Mihelič
EURASIP Journal on Advances in Signal Processing, 2006
[30] Speech/non-speech segmentation based on phoneme recognition features
Zibert, Janez
Pavesic, Nikola
Mihelic, France
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2006, 2006 (1)

← 1 2 3 4 5 →