An automatic speech recognition system in Odia language using attention mechanism and data augmentation

被引:0
|
作者
Malay Kumar Majhi [1 ]
Sujan Kumar Saha [1 ]
机构
[1] National Institute of Technology Durgapur,Department of CSE
关键词
Automatic speech recognition; Continuous speech recognition; Odia ASR; Data augmentation; Attention network;
D O I
10.1007/s10772-024-10132-6
中图分类号
学科分类号
摘要
This paper presents an automatic speech recognition (ASR) system developed for the Indian language Odia. In recent years, deep learning models have been used widely to develop ASR systems in various languages and domains. These models demand huge training resources, primarily annotated continuous speech utterances collected from various speakers. However, sufficient speech corpus is not available in many Indian languages. This paper explores the effectiveness of data augmentation in overcoming data scarcity in the Odia ASR task. The baseline system is developed using BiLSTM and the Seq2Seq framework. Next, a portion of the training data is selected based on phonetic richness, and certain augmentation techniques like pitch alteration and time stretching are applied. The augmented data is used along with the actual training data, and a substantial performance improvement is observed. The effectiveness of the attention mechanism in Odia ASR is also explored. When the system is trained through an attention layer embedded with the baseline BiLSTM model, it outperforms the baseline model and existing Odia ASR systems in the literature.
引用
收藏
页码:717 / 728
页数:11
相关论文
共 50 条
  • [41] Toward an automatic speech recognition system for amazigh-tarifit language
    El Ouahabi, Safaa
    Atounti, Mohamed
    Bellouki, Mohamed
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (02) : 421 - 432
  • [42] Automatic Language Recognition on Spontaneous Speech: The ATVS-UAM System
    Toledano, Doroteo T.
    Lopez-Moreno, Ignacio
    Mateos, Ismael
    Abejon, Alejandro
    Ramos, Daniel
    Gonzalez-Rodriguez, Joaquin
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2009, 57 (10): : 788 - 806
  • [43] Automatic language recognition on spontaneous speech: The ATVS-UAM system
    Toledano, Doroteo T.
    Ignacio, Lopez-Moreno
    Mateos, Ismael
    Alejandro, Abejon
    Ramos, Daniel
    Gonzalez-Rodriguez, Joaquin
    AES: Journal of the Audio Engineering Society, 2009, 57 (10): : 788 - 806
  • [44] AUTOMATIC RADIOLOGIC REPORTING SYSTEM USING SPEECH RECOGNITION
    MATUMOTO, T
    IINUMA, TA
    TATENO, Y
    IKEHIRA, H
    YAMASAKI, T
    FUKUHISA, K
    TSUNEMOTO, H
    SHISHIDO, F
    KUBO, Y
    INAMURA, K
    MEDICAL PROGRESS THROUGH TECHNOLOGY, 1987, 12 (3-4) : 243 - 257
  • [45] COMPARISON OF DATA AUGMENTATION AND ADAPTATION STRATEGIES FOR CODE-SWITCHED AUTOMATIC SPEECH RECOGNITION
    Ma, Min
    Ramabhadran, Bhuvana
    Emond, Jesse
    Rosenberg, Andrew
    Biadsy, Fadi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6081 - 6085
  • [46] A Language Model Optimization Method for Turkish Automatic Speech Recognition System
    Oyucu, Saadin
    Polat, Huseyin
    JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2023, 26 (03): : 1167 - 1178
  • [47] Toward an automatic speech recognition system for amazigh-tarifit language
    Safâa El Ouahabi
    Mohamed Atounti
    Mohamed Bellouki
    International Journal of Speech Technology, 2019, 22 : 421 - 432
  • [48] LANGUAGE IDENTIFICATION OF INDIVIDUAL WORDS IN A MULTILINGUAL AUTOMATIC SPEECH RECOGNITION SYSTEM
    Hategan, Andrea
    Barliga, Bogdan
    Tabus, Ioan
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4357 - +
  • [49] AUDITORY-BASED DATA AUGMENTATION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Tu, Zehai
    Deadman, Jack
    Ma, Ning
    Barker, Jon
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7447 - 7451
  • [50] PROTOLOGOS, SYSTEM FOR ROMANIAN LANGUAGE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU)
    Militaru, Diana
    Gavat, Inge
    Dumitru, Octavian
    Zaharia, Tiberiu
    Segarceanu, Svetlana
    FROM SPEECH PROCESSING TO SPOKEN LANGUAGE TECHNOLOGY, 2009, : 21 - 32