An automatic speech recognition system in Odia language using attention mechanism and data augmentation

被引:0
|
作者
Malay Kumar Majhi [1 ]
Sujan Kumar Saha [1 ]
机构
[1] National Institute of Technology Durgapur,Department of CSE
关键词
Automatic speech recognition; Continuous speech recognition; Odia ASR; Data augmentation; Attention network;
D O I
10.1007/s10772-024-10132-6
中图分类号
学科分类号
摘要
This paper presents an automatic speech recognition (ASR) system developed for the Indian language Odia. In recent years, deep learning models have been used widely to develop ASR systems in various languages and domains. These models demand huge training resources, primarily annotated continuous speech utterances collected from various speakers. However, sufficient speech corpus is not available in many Indian languages. This paper explores the effectiveness of data augmentation in overcoming data scarcity in the Odia ASR task. The baseline system is developed using BiLSTM and the Seq2Seq framework. Next, a portion of the training data is selected based on phonetic richness, and certain augmentation techniques like pitch alteration and time stretching are applied. The augmented data is used along with the actual training data, and a substantial performance improvement is observed. The effectiveness of the attention mechanism in Odia ASR is also explored. When the system is trained through an attention layer embedded with the baseline BiLSTM model, it outperforms the baseline model and existing Odia ASR systems in the literature.
引用
收藏
页码:717 / 728
页数:11
相关论文
共 50 条
  • [31] Using morphemes in language modeling and automatic speech recognition of Amharic
    Tachbelie, Martha Yifiru
    Abate, Solomon Teferra
    Menzel, Wolfgang
    NATURAL LANGUAGE ENGINEERING, 2014, 20 (02) : 235 - 259
  • [32] Agglutinative Language Speech Recognition Using Automatic Allophone Deriving
    Xu Ji
    Pan Jielin
    Yan Yonghong
    CHINESE JOURNAL OF ELECTRONICS, 2016, 25 (02) : 328 - 333
  • [33] Agglutinative Language Speech Recognition Using Automatic Allophone Deriving
    XU Ji
    PAN Jielin
    YAN Yonghong
    Chinese Journal of Electronics, 2016, 25 (02) : 328 - 333
  • [34] Textual Data Selection for Language Modelling in the Scope of Automatic Speech Recognition
    Mezzoudj, Freha
    Langlois, David
    Jouvet, Denis
    Benyettou, Abdelkader
    1ST INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE AND SPEECH PROCESSING, 2018, 128 : 55 - 64
  • [35] SPEECH AUGMENTATION USING WAVENET IN SPEECH RECOGNITION
    Wang, Jisung
    Kim, Sangki
    Lee, Yeha
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6770 - 6774
  • [36] Automatic Speech Recognition System for Malay Speaking Children Automatic Speech Recognition system
    Rahman, Feisal Dani
    Mohamed, Noraini
    Mustafa, Mumtaz Begum
    Salim, Siti Salwah
    2014 THIRD ICT INTERNATIONAL STUDENT PROJECT CONFERENCE (ICT-ISPC), 2014, : 79 - 82
  • [37] Feature-based Noise Robust Speech Recognition on an Indonesian Language Automatic Speech Recognition System
    Satriawan, Cil Hardianto
    Lestari, Dessi Puji
    2014 International Conference on Electrical Engineering and Computer Science (ICEECS), 2014, : 42 - 46
  • [38] MONOTONIC SEGMENTAL ATTENTION FOR AUTOMATIC SPEECH RECOGNITION
    Zeyer, Albert
    Schmitt, Robin
    Zhou, Wei
    Schlueter, Ralf
    Ney, Hermann
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 229 - 236
  • [39] Road Navigation System Using Automatic Speech Recognition (ASR) And Natural Language Processing (NLP)
    Withanage, Pooja
    Liyanage, Tharaka
    Deeyakaduwe, Naditha
    Dias, Eshan
    Thelijjagoda, Samantha
    2018 IEEE REGION 10 HUMANITARIAN TECHNOLOGY CONFERENCE (R10-HTC), 2018,
  • [40] An Experimental Study of Continuous Automatic Speech Recognition System Using MFCC with Reference to Punjabi Language
    Bassan, Nancy
    Kadyan, Virender
    RECENT FINDINGS IN INTELLIGENT COMPUTING TECHNIQUES, VOL 1, 2019, 707 : 267 - 275