An automatic speech recognition system in Odia language using attention mechanism and data augmentation

被引：0

作者：

Malay Kumar Majhi ^{[1
]}

Sujan Kumar Saha ^{[1
]}

机构：

[1] National Institute of Technology Durgapur,Department of CSE

来源：

International Journal of Speech Technology | 2024年 / 27卷 / 3期

关键词：

Automatic speech recognition; Continuous speech recognition; Odia ASR; Data augmentation; Attention network;

D O I：

10.1007/s10772-024-10132-6

中图分类号：

学科分类号：

摘要：

This paper presents an automatic speech recognition (ASR) system developed for the Indian language Odia. In recent years, deep learning models have been used widely to develop ASR systems in various languages and domains. These models demand huge training resources, primarily annotated continuous speech utterances collected from various speakers. However, sufficient speech corpus is not available in many Indian languages. This paper explores the effectiveness of data augmentation in overcoming data scarcity in the Odia ASR task. The baseline system is developed using BiLSTM and the Seq2Seq framework. Next, a portion of the training data is selected based on phonetic richness, and certain augmentation techniques like pitch alteration and time stretching are applied. The augmented data is used along with the actual training data, and a substantial performance improvement is observed. The effectiveness of the attention mechanism in Odia ASR is also explored. When the system is trained through an attention layer embedded with the baseline BiLSTM model, it outperforms the baseline model and existing Odia ASR systems in the literature.

引用

页码：717 / 728

页数：11

共 50 条

[41] Toward an automatic speech recognition system for amazigh-tarifit language
El Ouahabi, Safaa
Atounti, Mohamed
Bellouki, Mohamed
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (02) : 421 - 432
[42] Automatic Language Recognition on Spontaneous Speech: The ATVS-UAM System
Toledano, Doroteo T.
Lopez-Moreno, Ignacio
Mateos, Ismael
Abejon, Alejandro
Ramos, Daniel
Gonzalez-Rodriguez, Joaquin
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2009, 57 (10): : 788 - 806
[43] Automatic language recognition on spontaneous speech: The ATVS-UAM system
Toledano, Doroteo T.
Ignacio, Lopez-Moreno
Mateos, Ismael
Alejandro, Abejon
Ramos, Daniel
Gonzalez-Rodriguez, Joaquin
AES: Journal of the Audio Engineering Society, 2009, 57 (10): : 788 - 806
[44] AUTOMATIC RADIOLOGIC REPORTING SYSTEM USING SPEECH RECOGNITION
MATUMOTO, T
IINUMA, TA
TATENO, Y
IKEHIRA, H
YAMASAKI, T
FUKUHISA, K
TSUNEMOTO, H
SHISHIDO, F
KUBO, Y
INAMURA, K
MEDICAL PROGRESS THROUGH TECHNOLOGY, 1987, 12 (3-4) : 243 - 257
[45] COMPARISON OF DATA AUGMENTATION AND ADAPTATION STRATEGIES FOR CODE-SWITCHED AUTOMATIC SPEECH RECOGNITION
Ma, Min
Ramabhadran, Bhuvana
Emond, Jesse
Rosenberg, Andrew
Biadsy, Fadi
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6081 - 6085
[46] A Language Model Optimization Method for Turkish Automatic Speech Recognition System
Oyucu, Saadin
Polat, Huseyin
JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2023, 26 (03): : 1167 - 1178
[47] Toward an automatic speech recognition system for amazigh-tarifit language
Safâa El Ouahabi
Mohamed Atounti
Mohamed Bellouki
International Journal of Speech Technology, 2019, 22 : 421 - 432
[48] LANGUAGE IDENTIFICATION OF INDIVIDUAL WORDS IN A MULTILINGUAL AUTOMATIC SPEECH RECOGNITION SYSTEM
Hategan, Andrea
Barliga, Bogdan
Tabus, Ioan
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4357 - +
[49] AUDITORY-BASED DATA AUGMENTATION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
Tu, Zehai
Deadman, Jack
Ma, Ning
Barker, Jon
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7447 - 7451
[50] PROTOLOGOS, SYSTEM FOR ROMANIAN LANGUAGE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU)
Militaru, Diana
Gavat, Inge
Dumitru, Octavian
Zaharia, Tiberiu
Segarceanu, Svetlana
FROM SPEECH PROCESSING TO SPOKEN LANGUAGE TECHNOLOGY, 2009, : 21 - 32

← 1 2 3 4 5 →