A Study on Improving Acoustic Model for Robust and Far-Field Speech Recognition

被引:0
|
作者
Xue, Shaofei [1 ]
Yan, Zhijie [1 ]
Yu, Tao [2 ]
Liu, Zhang [3 ]
机构
[1] Alibaba Inc, Beijing, Peoples R China
[2] Alibaba Grp US Inc, Seattle, WA USA
[3] Alibaba Inc, Hangzhou, Zhejiang, Peoples R China
关键词
far-field speech recognition; deep neural network; simulated data; mandarin chinese; DEEP NEURAL-NETWORKS;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Far-field speech recognition is an essential technique for man-machine interactions. It aims to enable smart devices to recognize distant human speech. This technology is applied to many scenarios such as smart home appliances (smart loudspeaker, smart TV) and meeting transcription. Despite the significant advancement made in robust and far-field speech recognition after the introduction of deep neural network based acoustic models, the far-field speech recognition remains a challenging task due to various factors such as background noise, reverberation and even human voice interference. In this paper, we describe several technical advances for improving the performance of large-scale far-field speech recognition, including simulated data generation, improvements on front-end modules and neural network based acoustic models. Experimental results on several Mandarin Chinese speech recognition tasks have demonstrated that the combination of these technical advances can significantly outperform the conventional models.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] A Study on Deep Neural Network Acoustic Model Adaptation for Robust Far-field Speech Recognition
    Mirsamadi, Seyedmahdad
    Hansen, John H. L.
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2430 - 2434
  • [2] Far-Field Automatic Speech Recognition
    Haeb-Umbach, Reinhold
    Heymann, Jahn
    Drude, Lukas
    Watanabe, Shinji
    Delcroix, Marc
    Nakatani, Tomohiro
    PROCEEDINGS OF THE IEEE, 2021, 109 (02) : 124 - 148
  • [3] Model of the Far-Field Acoustic Localisation
    Strambersky, Radek
    Pavelka, Vaclav
    Weisz, Michal
    Guras, Radek
    2021 22ND INTERNATIONAL CARPATHIAN CONTROL CONFERENCE (ICCC), 2021, : 333 - 338
  • [4] Multichannel spatial clustering for robust far-field automatic speech recognition in mismatched conditions
    Mandel, Michael I.
    Barker, Jon P.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1991 - 1995
  • [5] Dereverberation of autoregressive envelopes for far-field speech recognition
    Purushothaman, Anurenjan
    Sreeram, Anirudh
    Kumar, Rohit
    Ganapathy, Sriram
    COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [6] Dereverberation and Beamforming in Robust Far-Field Speaker Recognition
    Masner, Ladislav
    Plchot, Oldrich
    Matejka, Pavel
    Novotny, Ondrej
    Cernocky, Jan Honza
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1334 - 1338
  • [7] 3-D ACOUSTIC MODELING FOR FAR-FIELD MULTI-CHANNEL SPEECH RECOGNITION
    Purushothaman, Anurenjan
    Sreeram, Anirudh
    Ganapathy, Sriram
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6964 - 6968
  • [8] Curriculum Learning based approaches for robust end-to-end far-field speech recognition
    Ranjan, Shivesh
    Hansen, John H. L.
    SPEECH COMMUNICATION, 2021, 132 : 123 - 131
  • [9] Robust far-field subwavelength imaging of scatterers by an acoustic superlens
    Dong, Yongkang
    Yu, Gaokun
    Wang, Ning
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 146 (06): : 4131 - 4143
  • [10] Hilbert Envelope Based Features for Far-Field Speech Recognition
    Thomas, Samuel
    Ganapathy, Srirarn
    Hermansky, Hynek
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, PROCEEDINGS, 2008, 5237 : 119 - +