An automatic speech recognition system in Odia language using attention mechanism and data augmentation

被引：0

作者：

Malay Kumar Majhi ^{[1
]}

Sujan Kumar Saha ^{[1
]}

机构：

[1] National Institute of Technology Durgapur,Department of CSE

来源：

International Journal of Speech Technology | 2024年 / 27卷 / 3期

关键词：

Automatic speech recognition; Continuous speech recognition; Odia ASR; Data augmentation; Attention network;

D O I：

10.1007/s10772-024-10132-6

中图分类号：

学科分类号：

摘要：

This paper presents an automatic speech recognition (ASR) system developed for the Indian language Odia. In recent years, deep learning models have been used widely to develop ASR systems in various languages and domains. These models demand huge training resources, primarily annotated continuous speech utterances collected from various speakers. However, sufficient speech corpus is not available in many Indian languages. This paper explores the effectiveness of data augmentation in overcoming data scarcity in the Odia ASR task. The baseline system is developed using BiLSTM and the Seq2Seq framework. Next, a portion of the training data is selected based on phonetic richness, and certain augmentation techniques like pitch alteration and time stretching are applied. The augmented data is used along with the actual training data, and a substantial performance improvement is observed. The effectiveness of the attention mechanism in Odia ASR is also explored. When the system is trained through an attention layer embedded with the baseline BiLSTM model, it outperforms the baseline model and existing Odia ASR systems in the literature.

引用

页码：717 / 728

页数：11

共 50 条

[31] Using morphemes in language modeling and automatic speech recognition of Amharic
Tachbelie, Martha Yifiru
Abate, Solomon Teferra
Menzel, Wolfgang
NATURAL LANGUAGE ENGINEERING, 2014, 20 (02) : 235 - 259
[32] Agglutinative Language Speech Recognition Using Automatic Allophone Deriving
Xu Ji
Pan Jielin
Yan Yonghong
CHINESE JOURNAL OF ELECTRONICS, 2016, 25 (02) : 328 - 333
[33] Agglutinative Language Speech Recognition Using Automatic Allophone Deriving
XU Ji
PAN Jielin
YAN Yonghong
Chinese Journal of Electronics, 2016, 25 (02) : 328 - 333
[34] Textual Data Selection for Language Modelling in the Scope of Automatic Speech Recognition
Mezzoudj, Freha
Langlois, David
Jouvet, Denis
Benyettou, Abdelkader
1ST INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE AND SPEECH PROCESSING, 2018, 128 : 55 - 64
[35] SPEECH AUGMENTATION USING WAVENET IN SPEECH RECOGNITION
Wang, Jisung
Kim, Sangki
Lee, Yeha
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6770 - 6774
[36] Automatic Speech Recognition System for Malay Speaking Children Automatic Speech Recognition system
Rahman, Feisal Dani
Mohamed, Noraini
Mustafa, Mumtaz Begum
Salim, Siti Salwah
2014 THIRD ICT INTERNATIONAL STUDENT PROJECT CONFERENCE (ICT-ISPC), 2014, : 79 - 82
[37] Feature-based Noise Robust Speech Recognition on an Indonesian Language Automatic Speech Recognition System
Satriawan, Cil Hardianto
Lestari, Dessi Puji
2014 International Conference on Electrical Engineering and Computer Science (ICEECS), 2014, : 42 - 46
[38] MONOTONIC SEGMENTAL ATTENTION FOR AUTOMATIC SPEECH RECOGNITION
Zeyer, Albert
Schmitt, Robin
Zhou, Wei
Schlueter, Ralf
Ney, Hermann
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 229 - 236
[39] Road Navigation System Using Automatic Speech Recognition (ASR) And Natural Language Processing (NLP)
Withanage, Pooja
Liyanage, Tharaka
Deeyakaduwe, Naditha
Dias, Eshan
Thelijjagoda, Samantha
2018 IEEE REGION 10 HUMANITARIAN TECHNOLOGY CONFERENCE (R10-HTC), 2018,
[40] An Experimental Study of Continuous Automatic Speech Recognition System Using MFCC with Reference to Punjabi Language
Bassan, Nancy
Kadyan, Virender
RECENT FINDINGS IN INTELLIGENT COMPUTING TECHNIQUES, VOL 1, 2019, 707 : 267 - 275

← 1 2 3 4 5 →