An automatic speech recognition system in Odia language using attention mechanism and data augmentation

被引:0
|
作者
Malay Kumar Majhi [1 ]
Sujan Kumar Saha [1 ]
机构
[1] National Institute of Technology Durgapur,Department of CSE
关键词
Automatic speech recognition; Continuous speech recognition; Odia ASR; Data augmentation; Attention network;
D O I
10.1007/s10772-024-10132-6
中图分类号
学科分类号
摘要
This paper presents an automatic speech recognition (ASR) system developed for the Indian language Odia. In recent years, deep learning models have been used widely to develop ASR systems in various languages and domains. These models demand huge training resources, primarily annotated continuous speech utterances collected from various speakers. However, sufficient speech corpus is not available in many Indian languages. This paper explores the effectiveness of data augmentation in overcoming data scarcity in the Odia ASR task. The baseline system is developed using BiLSTM and the Seq2Seq framework. Next, a portion of the training data is selected based on phonetic richness, and certain augmentation techniques like pitch alteration and time stretching are applied. The augmented data is used along with the actual training data, and a substantial performance improvement is observed. The effectiveness of the attention mechanism in Odia ASR is also explored. When the system is trained through an attention layer embedded with the baseline BiLSTM model, it outperforms the baseline model and existing Odia ASR systems in the literature.
引用
收藏
页码:717 / 728
页数:11
相关论文
共 50 条
  • [21] A DCRNN-based ensemble classifier for speech emotion recognition in Odia language
    Monorama Swain
    Bubai Maji
    P. Kabisatpathy
    Aurobinda Routray
    Complex & Intelligent Systems, 2022, 8 : 4237 - 4249
  • [22] A DCRNN-based ensemble classifier for speech emotion recognition in Odia language
    Swain, Monorama
    Maji, Bubai
    Kabisatpathy, P.
    Routray, Aurobinda
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (05) : 4237 - 4249
  • [23] A new language model for an automatic Arabic speech recognition system
    Rashwan, M.
    Journal of Engineering and Applied Science, 2002, 49 (01): : 175 - 193
  • [24] Grammar based automatic speech recognition system for the Polish language
    Korzinek, Danijel
    Brocki, Lukasz
    RECENT ADVANCES IN MECHATRONICS, 2007, : 87 - +
  • [25] SARMATA 2.0 Automatic Polish Language Speech Recognition System
    Ziolko, Bartosz
    Jadczyk, Tomasz
    Skurzok, Dawid
    Zelasko, Piotr
    Galka, Jakub
    Pedzimaz, Tomasz
    Gawlik, Ireneusz
    Palka, Szymon
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1062 - +
  • [26] Speech Command Classification System for Sinhala Language based on Automatic Speech Recognition
    Dinushika, Thilini
    Kavmini, Lakshika
    Abeyawardhana, Pamoda
    Thayasivam, Uthayasanker
    Jayasena, Sanath
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 205 - 210
  • [27] Creating Language and Acoustic Models using Kaldi to Build An Automatic Speech Recognition System for Kannada Language
    Yadava, Thimmaraja G.
    Jayanna, H. S.
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2017, : 161 - 165
  • [28] Improving Automatic Speech Recognition Utilizing Audio-codecs for Data Augmentation
    Hailu, Nirayo
    Siegert, Ingo
    Nurnberger, Andreas
    2020 IEEE 22ND INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2020,
  • [29] AugMixSpeech: A Data Augmentation Method and Consistency Regularization for Mandarin Automatic Speech Recognition
    Jiang, Yang
    Chen, Jun
    Han, Kai
    Liu, Yi
    Ma, Siqi
    Song, Yuqing
    Liu, Zhe
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 145 - 157
  • [30] Using morphemes in language modeling and automatic speech recognition of amharic
    Tachbelie, Martha Yifiru, 1600, Cambridge University Press (20):