Attention based end to end Speech Recognition for Voice Search in Hindi and English

被引:4
|
作者
Joshi, Raviraj [1 ]
Kannan, Venkateshan [1 ]
机构
[1] Flipkart, Bengaluru, India
关键词
automatic speech recognition; encoder-decoder models; attention; listen attend spell; SYSTEM; CHALLENGE;
D O I
10.1145/3503162.3503173
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe here our work with automatic speech recognition (ASR) in the context of voice search functionality on the Flipkart e-Commerce platform. Starting with the deep learning architecture of Listen-Attend-Spell (LAS), we build upon and expand the model design and attention mechanisms to incorporate innovative approaches including multi-objective training, multi-pass training, and external rescoring using language models and phoneme based losses. We report a relative WER improvement of 15.7% on top of state-of-the-art LAS models using these modifications. Overall, we report an improvement of 36.9% over the phoneme-CTC system on the Flipkart Voice Search dataset. The paper also provides an overview of different components that can be tuned in a LAS based system.
引用
收藏
页码:107 / 113
页数:7
相关论文
共 50 条
  • [1] ATTENTION-BASED END-TO-END SPEECH RECOGNITION ON VOICE SEARCH
    Shan, Changhao
    Zhang, Junbo
    Wang, Yujun
    Xie, Lei
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4764 - 4768
  • [2] END-TO-END SPEECH RECOGNITION ACCURACY METRIC FOR VOICE-SEARCH TASKS
    Levit, Michael
    Chang, Shuangyu
    Buntschuh, Bruce
    Kibre, Nick
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 5141 - 5144
  • [3] An End-to-end Speech Recognition Algorithm based on Attention Mechanism
    Chen, Jia-nan
    Gao, Shuang
    Sun, Han-zhe
    Liu, Xiao-hui
    Wang, Zi-ning
    Zheng, Yan
    PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 2935 - 2940
  • [4] Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition
    Weng, Chao
    Cui, Jia
    Wang, Guangsen
    Wang, Jun
    Yu, Changzhu
    Su, Dan
    Yu, Dong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 761 - 765
  • [5] TRIGGERED ATTENTION FOR END-TO-END SPEECH RECOGNITION
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5666 - 5670
  • [6] Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition
    Sun, Sining
    Guo, Pengcheng
    Xie, Lei
    Hwang, Mei-Yuh
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1826 - 1838
  • [7] Gaussian Prediction based Attention for Online End-to-End Speech Recognition
    Hou, Junfeng
    Zhang, Shiliang
    Dai, Lirong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3692 - 3696
  • [8] END-TO-END ATTENTION-BASED LARGE VOCABULARY SPEECH RECOGNITION
    Bandanau, Dzmitry
    Chorowski, Jan
    Serdyuk, Dmitriy
    Brakel, Philemon
    Bengio, Yoshua
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4945 - 4949
  • [9] Speaker Adaptation for Attention-Based End-to-End Speech Recognition
    Meng, Zhong
    Gaur, Yashesh
    Li, Jinyu
    Gong, Yifan
    INTERSPEECH 2019, 2019, : 241 - 245
  • [10] Large Margin Training for Attention Based End-to-End Speech Recognition
    Wang, Peidong
    Cui, Jia
    Weng, Chao
    Yu, Dong
    INTERSPEECH 2019, 2019, : 246 - 250