Robust Multi-Dialect End-to-End ASR Model Jointly with Beam Search Threshold Pruning and LLM

被引：0

作者：

M. C. Shunmuga Priya ^{[1
]}

D. Karthika Renuka ^{[2
]}

L. Ashok Kumar ^{[3
]}

机构：

[1] Amrita School of Computing,Department of Computer Science and Engineering

[2] Amrita Vishwa Vidyapeetham,Department of Information Technology

[3] PSG College of Technology,Department of Electrical and Electronics Engineering

[4] Thiagarajar College of Engineering,undefined

来源：

SN Computer Science | / 6卷 / 4期

关键词：

Automatic speech recognition; Log Mel filter bank energies; Beam search; Decoding; LLM;

D O I：

10.1007/s42979-025-03794-9

中图分类号：

学科分类号：

摘要：

This paper aims to develop a novel robust multi-dialect end-to-end ASR system with beam search threshold pruning. The efficacy of our proposed model is evaluated using word error rate (WER). Our key contributions are: (1) To develop an end-to-end ASR system using attention-based neural network architecture and analyze the effectiveness of two features such as MFCC and log mel filter bank energies on multiple speech dialect corpora including American, Britain, and Indian accents; (2) To integrate beam search threshold pruning as a decoding mechanism to reduce the decoding time (3) To conduct an experimental analysis to test the model performance and compare the results against baseline system. (4) Post processing analysis are carried out using Llama2-7B based large language model(LLM) for enhancing the performance of proposed ASR system. The proposed model significantly improves performance by 1.91% and 4.29% over clean and noisy speech in librispeech corpus. Similarly, for the Indian accented speech, the model attains an average WER of about 6.6%.

引用

共 5 条

[1] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
Imaizumi, Ryo
Masumura, Ryo
Shiota, Sayaka
Kiya, Hitoshi
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)
[2] Dialect-aware Semi-supervised Learning for End-to-End Multi-dialect Speech Recognition
Shiota, Sayaka
Imaizumi, Ryo
Masumura, Ryo
Kiya, Hitoshi
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 240 - 244
[3] Multi-Task End-to-End Model for Telugu Dialect and Speech Recognition
Yadavalli, Aditya
Mirishkar, Ganesh S.
Vuppala, Anil Kumar
INTERSPEECH 2022, 2022, : 1387 - 1391
[4] An End-to-End Robust Video Steganography Model Based on a Multi-Scale Neural Network
Xu, Shutong
Li, Zhaohong
Zhang, Zhenzhen
Liu, Junhui
ELECTRONICS, 2022, 11 (24)
[5] IMDAC: A robust intelligent software defect prediction model via multi-objective optimization and end-to-end hybrid deep learning networks
Zhu, Kun
Zhang, Nana
Jiang, Changjun
Zhu, Dandan
SOFTWARE-PRACTICE & EXPERIENCE, 2024, 54 (02): : 308 - 333

← 1 →