Robust Multi-Dialect End-to-End ASR Model Jointly with Beam Search Threshold Pruning and LLM

被引:0
|
作者
M. C. Shunmuga Priya [1 ]
D. Karthika Renuka [2 ]
L. Ashok Kumar [3 ]
机构
[1] Amrita School of Computing,Department of Computer Science and Engineering
[2] Amrita Vishwa Vidyapeetham,Department of Information Technology
[3] PSG College of Technology,Department of Electrical and Electronics Engineering
[4] Thiagarajar College of Engineering,undefined
关键词
Automatic speech recognition; Log Mel filter bank energies; Beam search; Decoding; LLM;
D O I
10.1007/s42979-025-03794-9
中图分类号
学科分类号
摘要
This paper aims to develop a novel robust multi-dialect end-to-end ASR system with beam search threshold pruning. The efficacy of our proposed model is evaluated using word error rate (WER). Our key contributions are: (1) To develop an end-to-end ASR system using attention-based neural network architecture and analyze the effectiveness of two features such as MFCC and log mel filter bank energies on multiple speech dialect corpora including American, Britain, and Indian accents; (2) To integrate beam search threshold pruning as a decoding mechanism to reduce the decoding time (3) To conduct an experimental analysis to test the model performance and compare the results against baseline system. (4) Post processing analysis are carried out using Llama2-7B based large language model(LLM) for enhancing the performance of proposed ASR system. The proposed model significantly improves performance by 1.91% and 4.29% over clean and noisy speech in librispeech corpus. Similarly, for the Indian accented speech, the model attains an average WER of about 6.6%.
引用
收藏
相关论文
共 5 条
  • [1] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
  • [2] Dialect-aware Semi-supervised Learning for End-to-End Multi-dialect Speech Recognition
    Shiota, Sayaka
    Imaizumi, Ryo
    Masumura, Ryo
    Kiya, Hitoshi
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 240 - 244
  • [3] Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition
    Yadavalli, Aditya
    Mirishkar, Ganesh S.
    Vuppala, Anil Kumar
    INTERSPEECH 2022, 2022, : 1387 - 1391
  • [4] An End-to-End Robust Video Steganography Model Based on a Multi-Scale Neural Network
    Xu, Shutong
    Li, Zhaohong
    Zhang, Zhenzhen
    Liu, Junhui
    ELECTRONICS, 2022, 11 (24)
  • [5] IMDAC: A robust intelligent software defect prediction model via multi-objective optimization and end-to-end hybrid deep learning networks
    Zhu, Kun
    Zhang, Nana
    Jiang, Changjun
    Zhu, Dandan
    SOFTWARE-PRACTICE & EXPERIENCE, 2024, 54 (02): : 308 - 333