An automatic speech recognition system in Odia language using attention mechanism and data augmentation

被引:0
|
作者
Malay Kumar Majhi [1 ]
Sujan Kumar Saha [1 ]
机构
[1] National Institute of Technology Durgapur,Department of CSE
关键词
Automatic speech recognition; Continuous speech recognition; Odia ASR; Data augmentation; Attention network;
D O I
10.1007/s10772-024-10132-6
中图分类号
学科分类号
摘要
This paper presents an automatic speech recognition (ASR) system developed for the Indian language Odia. In recent years, deep learning models have been used widely to develop ASR systems in various languages and domains. These models demand huge training resources, primarily annotated continuous speech utterances collected from various speakers. However, sufficient speech corpus is not available in many Indian languages. This paper explores the effectiveness of data augmentation in overcoming data scarcity in the Odia ASR task. The baseline system is developed using BiLSTM and the Seq2Seq framework. Next, a portion of the training data is selected based on phonetic richness, and certain augmentation techniques like pitch alteration and time stretching are applied. The augmented data is used along with the actual training data, and a substantial performance improvement is observed. The effectiveness of the attention mechanism in Odia ASR is also explored. When the system is trained through an attention layer embedded with the baseline BiLSTM model, it outperforms the baseline model and existing Odia ASR systems in the literature.
引用
收藏
页码:717 / 728
页数:11
相关论文
共 50 条
  • [1] Automatic Speech Recognition Based Odia System
    Karan, Biswajit
    Sahoo, Jayaprakash
    Sahu, P. K.
    2015 INTERNATIONAL CONFERENCE ON MICROWAVE, OPTICAL AND COMMUNICATION ENGINEERING (ICMOCE), 2015, : 353 - 356
  • [2] Adaptive data augmentation for mandarin automatic speech recognition
    Ding, Kai
    Li, Ruixuan
    Xu, Yuelin
    Du, Xingyue
    Deng, Bin
    APPLIED INTELLIGENCE, 2024, 54 (07) : 5674 - 5687
  • [3] SPEECH EMOTION RECOGNITION WITH MULTISCALE AREA ATTENTION AND DATA AUGMENTATION
    Xu, Mingke
    Zhang, Fan
    Cui, Xiaodong
    Zhang, Wei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6319 - 6323
  • [4] SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
    Park, Daniel S.
    Chan, William
    Zhang, Yu
    Chiu, Chung-Cheng
    Zoph, Barret
    Cubuk, Ekin D.
    Le, Quoc, V
    INTERSPEECH 2019, 2019, : 2613 - 2617
  • [5] A Survey of the Effects of Data Augmentation for Automatic Speech Recognition Systems
    Manuel Ramirez, Jose
    Montalvo, Ana
    Ramon Calvo, Jose
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS (CIARP 2019), 2019, 11896 : 669 - 678
  • [6] Speech Emotion Recognition Using Data Augmentation
    Kapoor, Tanisha
    Ganguly, Arnaja
    Rajeswari, D.
    2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
  • [7] Speech emotion recognition using data augmentation
    V. M. Praseetha
    P. P. Joby
    International Journal of Speech Technology, 2022, 25 : 783 - 792
  • [8] Speech emotion recognition using data augmentation
    Praseetha, V. M.
    Joby, P. P.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 25 (4) : 783 - 792
  • [9] Enhancing the Power of CNN Using Data Augmentation Techniques for Odia Handwritten Character Recognition
    Das, Mamatarani
    Panda, Mrutyunjaya
    Dash, Shreela
    ADVANCES IN MULTIMEDIA, 2022, 2022
  • [10] Data Augmentation using Healthy Speech for Dysarthric Speech Recognition
    Vachhani, Bhavik
    Bhat, Chitralekha
    Kopparapu, Sunil Kumar
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 471 - 475