Robust speech recognition in reverberant environments by using an optimal synthetic room impulse response model

被引:8
|
作者
Liu, Jindong [1 ]
Yang, Guang-Zhong [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Hamlyn Ctr, London, England
关键词
Speech recognition; Reverberant environment; Artificial synthetic room impulse response;
D O I
10.1016/j.specom.2014.11.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a practical technique for Automatic speech recognition (ASR) in multiple reverberant environment selection. Multiple ASR models are trained with artificial synthetic room impulse responses (IRs), i.e. simulated room IRs, with different reverberation time (T(60)(Model)s) and tested on real room IRs with varying T(60)(Room)s. To apply our method, the biggest challenge is to choose a proper artificial room IR model for training ASR models. In this paper, a generalised statistical IR model with attenuated reverberation after an early reflection period, named attenuated IR model, has been adopted based on three time-domain statistical IR models. Its optimal values of the reverberation-attenuation factor and the early reflection period on the recognition rate have been searched and determined. Extensive testing has been performed over four real room IR sets (63 IRs in total) with variant T(60)(Room)s and speaker microphone distances (SMDs). The optimised attenuated IR model had the best performance in terms of recognition rate over others. Specific considerations of the practical use of the method have been taken into account including: (i) the maximal training step of T-60(Model) in order to get the minimal number of models with acceptable performance; (ii) the impact of selection errors on the ASR caused by the estimation error of T-60(Room); and (iii) the performance over SMD and direct-to-reverberation energy Ratio (DRR). It is shown that recognition rates of over 80 similar to 90% are achieved in most cases. One important advantage of the method is that T-60(Room) can be estimated either from reverberant sound directly (Takeda et al., 2009; Falk and Chan, 2010; Lollmann et al., 2010) or from an IR measured from any point of the room as it remains constant in the same room (Kuttruff, 2000), thus it is particularly suited to mobile applications. Compared to many classical dereverberation methods, the proposed method is more suited to ASR tasks in multiple reverberant environments, such as human-robot interaction. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:65 / 77
页数:13
相关论文
共 50 条
  • [1] IMPULSE RESPONSE ESTIMATION FOR ROBUST SPEECH RECOGNITION IN A REVERBERANT ENVIRONMENT
    Ravanelli, Mirco
    Sosi, Alessandro
    Svaizer, Piergiorgio
    Omologo, Maurizio
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1668 - 1672
  • [2] IMPROVING REVERBERANT SPEECH SEPARATION WITH SYNTHETIC ROOM IMPULSE RESPONSES
    Aralikatti, Rohith
    Ratnarajah, Anton
    Tang, Zhenyu
    Manocha, Dinesh
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 900 - 906
  • [3] Methods for Robust Speech Recognition in Reverberant Environments: A Comparison
    Petrick, Rico
    Feher, Thomas
    Unoki, Masashi
    Hoffmann, Ruediger
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 582 - +
  • [4] Robust interference suppression and blind speech beamforming in room reverberant environments
    Ma, WK
    Ching, PC
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 493 - 496
  • [5] Speech Recognition in reverberant environments using remote microphones
    Brayda, Luca
    Wellekens, Christian
    Matassoni, Marco
    Omologo, Maurizio
    ISM 2006: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2006, : 584 - 591
  • [6] Robust automatic speech recognition based on neural network in reverberant environments
    Bai, L.
    Li, H. L.
    He, Y. Y.
    CIVIL, ARCHITECTURE AND ENVIRONMENTAL ENGINEERING, VOLS 1 AND 2, 2017, : 1319 - 1324
  • [7] Distant-talking robust speech recognition using late reflection components of room impulse response
    Gomez, Randy
    Even, Jani
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4581 - 4584
  • [8] Robust Front End Processing for Speech Recognition in Reverberant Environments: Utilization of Speech Characteristics
    Petrick, Rico
    Lu, Xugang
    Unoki, Masashi
    Akagi, Masato
    Hoffmann, Ruediger
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 658 - +
  • [9] Blind Model Selection for Automatic Speech Recognition in Reverberant Environments
    Laurent Couvreur
    Christophe Couvreur
    Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 : 189 - 203
  • [10] Blind model selection for automatic speech recognition in reverberant environments
    Couvreur, L
    Couvreur, C
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 189 - 203