An automatic speech recognition system in Odia language using attention mechanism and data augmentation

被引：0

作者：

Malay Kumar Majhi ^{[1
]}

Sujan Kumar Saha ^{[1
]}

机构：

[1] National Institute of Technology Durgapur,Department of CSE

来源：

International Journal of Speech Technology | 2024年 / 27卷 / 3期

关键词：

Automatic speech recognition; Continuous speech recognition; Odia ASR; Data augmentation; Attention network;

D O I：

10.1007/s10772-024-10132-6

中图分类号：

学科分类号：

摘要：

This paper presents an automatic speech recognition (ASR) system developed for the Indian language Odia. In recent years, deep learning models have been used widely to develop ASR systems in various languages and domains. These models demand huge training resources, primarily annotated continuous speech utterances collected from various speakers. However, sufficient speech corpus is not available in many Indian languages. This paper explores the effectiveness of data augmentation in overcoming data scarcity in the Odia ASR task. The baseline system is developed using BiLSTM and the Seq2Seq framework. Next, a portion of the training data is selected based on phonetic richness, and certain augmentation techniques like pitch alteration and time stretching are applied. The augmented data is used along with the actual training data, and a substantial performance improvement is observed. The effectiveness of the attention mechanism in Odia ASR is also explored. When the system is trained through an attention layer embedded with the baseline BiLSTM model, it outperforms the baseline model and existing Odia ASR systems in the literature.

引用

页码：717 / 728

页数：11

共 50 条

[1] Automatic Speech Recognition Based Odia System
Karan, Biswajit
Sahoo, Jayaprakash
Sahu, P. K.
2015 INTERNATIONAL CONFERENCE ON MICROWAVE, OPTICAL AND COMMUNICATION ENGINEERING (ICMOCE), 2015, : 353 - 356
[2] Adaptive data augmentation for mandarin automatic speech recognition
Ding, Kai
Li, Ruixuan
Xu, Yuelin
Du, Xingyue
Deng, Bin
APPLIED INTELLIGENCE, 2024, 54 (07) : 5674 - 5687
[3] SPEECH EMOTION RECOGNITION WITH MULTISCALE AREA ATTENTION AND DATA AUGMENTATION
Xu, Mingke
Zhang, Fan
Cui, Xiaodong
Zhang, Wei
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6319 - 6323
[4] SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Park, Daniel S.
Chan, William
Zhang, Yu
Chiu, Chung-Cheng
Zoph, Barret
Cubuk, Ekin D.
Le, Quoc, V
INTERSPEECH 2019, 2019, : 2613 - 2617
[5] A Survey of the Effects of Data Augmentation for Automatic Speech Recognition Systems
Manuel Ramirez, Jose
Montalvo, Ana
Ramon Calvo, Jose
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS (CIARP 2019), 2019, 11896 : 669 - 678
[6] Speech Emotion Recognition Using Data Augmentation
Kapoor, Tanisha
Ganguly, Arnaja
Rajeswari, D.
2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
[7] Speech emotion recognition using data augmentation
V. M. Praseetha
P. P. Joby
International Journal of Speech Technology, 2022, 25 : 783 - 792
[8] Speech emotion recognition using data augmentation
Praseetha, V. M.
Joby, P. P.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 25 (4) : 783 - 792
[9] Enhancing the Power of CNN Using Data Augmentation Techniques for Odia Handwritten Character Recognition
Das, Mamatarani
Panda, Mrutyunjaya
Dash, Shreela
ADVANCES IN MULTIMEDIA, 2022, 2022
[10] Data Augmentation using Healthy Speech for Dysarthric Speech Recognition
Vachhani, Bhavik
Bhat, Chitralekha
Kopparapu, Sunil Kumar
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 471 - 475

← 1 2 3 4 5 →