Robust speech recognition in reverberant environments by using an optimal synthetic room impulse response model

被引:8
|
作者
Liu, Jindong [1 ]
Yang, Guang-Zhong [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Hamlyn Ctr, London, England
关键词
Speech recognition; Reverberant environment; Artificial synthetic room impulse response;
D O I
10.1016/j.specom.2014.11.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a practical technique for Automatic speech recognition (ASR) in multiple reverberant environment selection. Multiple ASR models are trained with artificial synthetic room impulse responses (IRs), i.e. simulated room IRs, with different reverberation time (T(60)(Model)s) and tested on real room IRs with varying T(60)(Room)s. To apply our method, the biggest challenge is to choose a proper artificial room IR model for training ASR models. In this paper, a generalised statistical IR model with attenuated reverberation after an early reflection period, named attenuated IR model, has been adopted based on three time-domain statistical IR models. Its optimal values of the reverberation-attenuation factor and the early reflection period on the recognition rate have been searched and determined. Extensive testing has been performed over four real room IR sets (63 IRs in total) with variant T(60)(Room)s and speaker microphone distances (SMDs). The optimised attenuated IR model had the best performance in terms of recognition rate over others. Specific considerations of the practical use of the method have been taken into account including: (i) the maximal training step of T-60(Model) in order to get the minimal number of models with acceptable performance; (ii) the impact of selection errors on the ASR caused by the estimation error of T-60(Room); and (iii) the performance over SMD and direct-to-reverberation energy Ratio (DRR). It is shown that recognition rates of over 80 similar to 90% are achieved in most cases. One important advantage of the method is that T-60(Room) can be estimated either from reverberant sound directly (Takeda et al., 2009; Falk and Chan, 2010; Lollmann et al., 2010) or from an IR measured from any point of the room as it remains constant in the same room (Kuttruff, 2000), thus it is particularly suited to mobile applications. Compared to many classical dereverberation methods, the proposed method is more suited to ASR tasks in multiple reverberant environments, such as human-robot interaction. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:65 / 77
页数:13
相关论文
共 50 条
  • [21] A STUDY ON DATA AUGMENTATION OF REVERBERANT SPEECH FOR ROBUST SPEECH RECOGNITION
    Ko, Tom
    Peddinti, Vijayaditya
    Povey, Daniel
    Seltzer, Michael L.
    Khudanpur, Sanjeev
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5220 - 5224
  • [22] FILTERED NOISE SHAPING FOR TIME DOMAIN ROOM IMPULSE RESPONSE ESTIMATION FROM REVERBERANT SPEECH
    Steinmetz, Christian J.
    Ithapu, Vamsi Krishna
    Calamia, Paul
    2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, : 221 - 225
  • [23] Acoustic diversity for improved speech recognition in reverberant environments
    Gillespie, BW
    Atlas, LE
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 557 - 560
  • [24] Speech recognition in multisource reverberant environments with binaural inputs
    Roman, Nicoleta
    Srinivasan, Soundararajan
    Wang, DeLiang
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 309 - 312
  • [25] ROBUST SPEECH RECOGNITION IN UNKNOWN REVERBERANT AND NOISY CONDITIONS
    Hsiao, Roger
    Ma, Jeff
    Hartmann, William
    Karafiat, Martin
    Grezl, Frantisek
    Burget, Lukas
    Szoke, Igor
    Cernocky, Jan Honza
    Watanabe, Shinji
    Chen, Zhuo
    Mallidi, Sri Harish
    Hermansky, Hynek
    Tsakalidis, Stavros
    Schwartz, Richard
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 533 - 538
  • [26] Techniques for robust speech recognition in noisy and reverberant conditions
    Brown, GJ
    Palomäki, KJ
    SPEECH SEPARATION BY HUMANS AND MACHINES, 2005, : 213 - 220
  • [27] Feature Transformations for Robust Speech Recognition in Reverberant Conditions
    Yuliani, Asri R.
    Sustika, Rika
    Yuwana, Raden S.
    Pardede, Hilman F.
    2017 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS AND ITS APPLICATIONS (IC3INA), 2017, : 57 - 62
  • [28] Missing feature speech recognition using dereverberation and echo suppression in reverberant environments
    Park, Hyung-Min
    Stern, Richard M.
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 381 - +
  • [29] ROBUST RECOGNITION OF REVERBERANT AND NOISY SPEECH USING COHERENCE-BASED PROCESSING
    Menon, Anjali
    Kim, Chanwoo
    Stern, Richard M.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6775 - 6779
  • [30] Blind Estimation of Room Impulse Response from Monaural Reverberant Speech with Segmental Generative Neural Network
    Liao, Zhiheng
    Xiong, Feifei
    Luo, Juan
    Cai, Minjie
    Chng, Eng Siong
    Feng, Jinwei
    Zhong, Xionghu
    INTERSPEECH 2023, 2023, : 2723 - 2727