Robust speech recognition in reverberant environments by using an optimal synthetic room impulse response model

被引:8
|
作者
Liu, Jindong [1 ]
Yang, Guang-Zhong [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Hamlyn Ctr, London, England
关键词
Speech recognition; Reverberant environment; Artificial synthetic room impulse response;
D O I
10.1016/j.specom.2014.11.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a practical technique for Automatic speech recognition (ASR) in multiple reverberant environment selection. Multiple ASR models are trained with artificial synthetic room impulse responses (IRs), i.e. simulated room IRs, with different reverberation time (T(60)(Model)s) and tested on real room IRs with varying T(60)(Room)s. To apply our method, the biggest challenge is to choose a proper artificial room IR model for training ASR models. In this paper, a generalised statistical IR model with attenuated reverberation after an early reflection period, named attenuated IR model, has been adopted based on three time-domain statistical IR models. Its optimal values of the reverberation-attenuation factor and the early reflection period on the recognition rate have been searched and determined. Extensive testing has been performed over four real room IR sets (63 IRs in total) with variant T(60)(Room)s and speaker microphone distances (SMDs). The optimised attenuated IR model had the best performance in terms of recognition rate over others. Specific considerations of the practical use of the method have been taken into account including: (i) the maximal training step of T-60(Model) in order to get the minimal number of models with acceptable performance; (ii) the impact of selection errors on the ASR caused by the estimation error of T-60(Room); and (iii) the performance over SMD and direct-to-reverberation energy Ratio (DRR). It is shown that recognition rates of over 80 similar to 90% are achieved in most cases. One important advantage of the method is that T-60(Room) can be estimated either from reverberant sound directly (Takeda et al., 2009; Falk and Chan, 2010; Lollmann et al., 2010) or from an IR measured from any point of the room as it remains constant in the same room (Kuttruff, 2000), thus it is particularly suited to mobile applications. Compared to many classical dereverberation methods, the proposed method is more suited to ASR tasks in multiple reverberant environments, such as human-robot interaction. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:65 / 77
页数:13
相关论文
共 50 条
  • [41] SUBBAND MINIMUM CLASSIFICATION ERROR BEAMFORMING FOR SPEECH RECOGNITION IN REVERBERANT ENVIRONMENTS
    Liao, Yuan-Fu
    Xu, I-Yun
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4702 - 4705
  • [42] SOUNDFIELD RECONSTRUCTION IN REVERBERANT ENVIRONMENTS USING HIGHER-ORDER MICROPHONES AND IMPULSE RESPONSE MEASUREMENTS
    Borra, Federico
    Gebru, Israel Dejene
    Markovic, Dejan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 281 - 285
  • [43] Subband likelihood-maximizing beamforming for speech recognition in reverberant environments
    Seltzer, Michael L.
    Stern, Richard M.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (06): : 2109 - 2121
  • [44] Subband parameter optimization of microphone arrays for speech recognition in reverberant environments
    Seltzer, ML
    Stern, RM
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 408 - 411
  • [45] Instantaneous model adaptation method for reverberant speech recognition
    Ban, Sung Min
    Kim, Hyung Soon
    ELECTRONICS LETTERS, 2015, 51 (06) : 528 - 529
  • [46] Estimation of speech recognition performance in noisy and reverberant environments using PESQ score and acoustic parameters
    Fukumori, Takahiro
    Nakayama, Masato
    Nishiura, Takanobu
    Yamashita, Yoichi
    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [47] Model compensation using robust features for robust speech recognition
    Zhang, Jun
    Wei, Gang
    Shuju Caiji Yu Chuli/Journal of Data Acquisition and Processing, 2003, 18 (03):
  • [48] ESTIMATING ROOM ACOUSTIC PARAMETERS FOR SPEECH RECOGNIZER ADAPTATION AND COMBINATION IN REVERBERANT ENVIRONMENTS
    Xiong, Feifei
    Goetze, Stefan
    Meyer, Bernd T.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [49] Blind Estimation of Speech Transmission Index and Room Acoustic Parameters by Using Extended Model of Room Impulse Response Derived From Speech Signals
    Wang, Lijun
    Duangpummet, Suradej
    Unoki, Masashi
    IEEE ACCESS, 2023, 11 : 49431 - 49444
  • [50] Auditory model for robust speech recognition in real world noisy environments
    Kim, DS
    Lee, SY
    Kil, RM
    Zhu, XL
    ELECTRONICS LETTERS, 1997, 33 (01) : 12 - 13