An automatic speech recognition system in Odia language using attention mechanism and data augmentation

被引：0

作者：

Malay Kumar Majhi ^{[1
]}

Sujan Kumar Saha ^{[1
]}

机构：

[1] National Institute of Technology Durgapur,Department of CSE

来源：

International Journal of Speech Technology | 2024年 / 27卷 / 3期

关键词：

Automatic speech recognition; Continuous speech recognition; Odia ASR; Data augmentation; Attention network;

D O I：

10.1007/s10772-024-10132-6

中图分类号：

学科分类号：

摘要：

This paper presents an automatic speech recognition (ASR) system developed for the Indian language Odia. In recent years, deep learning models have been used widely to develop ASR systems in various languages and domains. These models demand huge training resources, primarily annotated continuous speech utterances collected from various speakers. However, sufficient speech corpus is not available in many Indian languages. This paper explores the effectiveness of data augmentation in overcoming data scarcity in the Odia ASR task. The baseline system is developed using BiLSTM and the Seq2Seq framework. Next, a portion of the training data is selected based on phonetic richness, and certain augmentation techniques like pitch alteration and time stretching are applied. The augmented data is used along with the actual training data, and a substantial performance improvement is observed. The effectiveness of the attention mechanism in Odia ASR is also explored. When the system is trained through an attention layer embedded with the baseline BiLSTM model, it outperforms the baseline model and existing Odia ASR systems in the literature.

引用

页码：717 / 728

页数：11

共 50 条

[21] A DCRNN-based ensemble classifier for speech emotion recognition in Odia language
Monorama Swain
Bubai Maji
P. Kabisatpathy
Aurobinda Routray
Complex & Intelligent Systems, 2022, 8 : 4237 - 4249
[22] A DCRNN-based ensemble classifier for speech emotion recognition in Odia language
Swain, Monorama
Maji, Bubai
Kabisatpathy, P.
Routray, Aurobinda
COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (05) : 4237 - 4249
[23] A new language model for an automatic Arabic speech recognition system
Rashwan, M.
Journal of Engineering and Applied Science, 2002, 49 (01): : 175 - 193
[24] Grammar based automatic speech recognition system for the Polish language
Korzinek, Danijel
Brocki, Lukasz
RECENT ADVANCES IN MECHATRONICS, 2007, : 87 - +
[25] SARMATA 2.0 Automatic Polish Language Speech Recognition System
Ziolko, Bartosz
Jadczyk, Tomasz
Skurzok, Dawid
Zelasko, Piotr
Galka, Jakub
Pedzimaz, Tomasz
Gawlik, Ireneusz
Palka, Szymon
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1062 - +
[26] Speech Command Classification System for Sinhala Language based on Automatic Speech Recognition
Dinushika, Thilini
Kavmini, Lakshika
Abeyawardhana, Pamoda
Thayasivam, Uthayasanker
Jayasena, Sanath
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 205 - 210
[27] Creating Language and Acoustic Models using Kaldi to Build An Automatic Speech Recognition System for Kannada Language
Yadava, Thimmaraja G.
Jayanna, H. S.
2017 2ND IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2017, : 161 - 165
[28] Improving Automatic Speech Recognition Utilizing Audio-codecs for Data Augmentation
Hailu, Nirayo
Siegert, Ingo
Nurnberger, Andreas
2020 IEEE 22ND INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2020,
[29] AugMixSpeech: A Data Augmentation Method and Consistency Regularization for Mandarin Automatic Speech Recognition
Jiang, Yang
Chen, Jun
Han, Kai
Liu, Yi
Ma, Siqi
Song, Yuqing
Liu, Zhe
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 145 - 157
[30] Using morphemes in language modeling and automatic speech recognition of amharic
Tachbelie, Martha Yifiru, 1600, Cambridge University Press (20):

← 1 2 3 4 5 →