Attention based end to end Speech Recognition for Voice Search in Hindi and English

被引:4
|
作者
Joshi, Raviraj [1 ]
Kannan, Venkateshan [1 ]
机构
[1] Flipkart, Bengaluru, India
关键词
automatic speech recognition; encoder-decoder models; attention; listen attend spell; SYSTEM; CHALLENGE;
D O I
10.1145/3503162.3503173
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe here our work with automatic speech recognition (ASR) in the context of voice search functionality on the Flipkart e-Commerce platform. Starting with the deep learning architecture of Listen-Attend-Spell (LAS), we build upon and expand the model design and attention mechanisms to incorporate innovative approaches including multi-objective training, multi-pass training, and external rescoring using language models and phoneme based losses. We report a relative WER improvement of 15.7% on top of state-of-the-art LAS models using these modifications. Overall, we report an improvement of 36.9% over the phoneme-CTC system on the Flipkart Voice Search dataset. The paper also provides an overview of different components that can be tuned in a LAS based system.
引用
收藏
页码:107 / 113
页数:7
相关论文
共 50 条
  • [41] Towards Efficiently Learning Monotonic Alignments for Attention-Based End-to-End Speech Recognition
    Miao, Chenfeng
    Zou, Kun
    Zhuang, Ziyang
    Wei, Tao
    Ma, Jun
    Wang, Shaojun
    Xiao, Jing
    INTERSPEECH 2022, 2022, : 1051 - 1055
  • [42] Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units
    Xiao, Zhangyu
    Ou, Zhijian
    Chu, Wei
    Lin, Hui
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 146 - 150
  • [43] Information Distance Based Self-Attention-BGRU Layer for End-to-End Speech Recognition
    Yan, Yunhao
    Yan, Qinmengying
    Hua, Guang
    Zhang, Haijian
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [44] End-to-end Tibetan Ando dialect speech recognition based on hybrid CTC/attention architecture
    Sun, Jingwen
    Zhou, Gang
    Yang, Hongwu
    Wang, Man
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 628 - 632
  • [45] Improved CTC-Attention Based End-to-End Speech Recognition on Air Traffic Control
    Zhou, Kai
    Yang, Qun
    Sun, XiuSong
    Liu, ShaoHan
    Lu, JinJun
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: BIG DATA AND MACHINE LEARNING, PT II, 2019, 11936 : 187 - 196
  • [46] A hybrid CTC plus Attention model based on end-to-end framework for multilingual speech recognition
    Liang, Sendong
    Yan, Wei Qi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 41295 - 41308
  • [47] Improved training strategies for end-to-end speech recognition in digital voice assistants
    Tulsiani, Hitesh
    Sapru, Ashtosh
    Arsikere, Harish
    Punjabi, Surabhi
    Garimella, Sri
    INTERSPEECH 2020, 2020, : 2792 - 2796
  • [48] Attention-based latent features for jointly trained end-to-end automatic speech recognition with modified speech enhancement
    Yang, Da-Hee
    Chang, Joon-Hyuk
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (03) : 202 - 210
  • [49] Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention
    Liang, Chengdong
    Xu, Menglong
    Zhang, Xiao-Lei
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 2 : 1495 - 1499
  • [50] End-to-End-Based Tibetan Multitask Speech Recognition
    Zhao, Yue
    Yue, Jianjian
    Xu, Xiaona
    Wu, Licheng
    Li, Xiali
    IEEE ACCESS, 2019, 7 : 162519 - 162529