Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition

被引:42
|
作者
Sehr, Armin [1 ]
Maas, Roland [1 ]
Kellermann, Walter [1 ]
机构
[1] Univ Erlangen Nurnberg, Chair Multimedia Commun & Signal Proc, D-91058 Erlangen, Germany
关键词
Acoustic modeling; distant-talking automatic; speech recognition (ASR); model-based dereverberation; reverberation model; robust ASR; LINEAR PREDICTION; COMPENSATION; ADAPTATION;
D O I
10.1109/TASL.2010.2050511
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The REMOS (REverberation MOdeling for Speech recognition) concept for reverberation-robust distant-talking speech recognition, introduced in "Distant-talking continuous speech recognition based on a novel reverberation model in the feature domain" (A. Sehr et al., in Proc. Interspeech, 2006, pp. 769-772) for melspectral features, is extended to logarithmic melspectral (logmelspec) features in this contribution. Thus, the favorable properties of REMOS, including its high flexibility with respect to changing reverberation conditions, become available in the more competitive logmelspec domain. Based on a combined acoustic model consisting of a hidden Markov model (HMM) network and a reverberation model (RM), REMOS determines clean-speech and reverberation estimates during recognition. Therefore, in each iteration of a modified Viterbi algorithm, an inner optimization operation maximizes the joint density of the current HMM output and the RM output subject to the constraint that their combination is equal to the current reverberant observation. Since the combination operation in the logmelspec domain is nonlinear, numerical methods appear necessary for solving the constrained inner optimization problem. A novel reformulation of the constraint, which allows for an efficient solution by nonlinear optimization algorithms, is derived in this paper so that a practicable implementation of REMOS for logmelspec features becomes possible. An in-depth analysis of this REMOS implementation investigates the statistical properties of its reverberation estimates and thus derives possibilities for further improving the performance of REMOS. Connected digit recognition experiments show that the proposed REMOS version in the logmelspec domain significantly outperforms the melspec version. While the proposed RMs with parameters estimated by straightforward training for a given room are robust to a mismatch of the speaker-microphone distance, their performance significantly decreases if they are used in a room with substantially different conditions. However, by training multi-style RMs with data from several rooms, good performance can be achieved across different rooms.
引用
收藏
页码:1676 / 1691
页数:16
相关论文
共 50 条
  • [1] MODEL-BASED DEREVERBERATION IN THE LOGMELSPEC DOMAIN FOR ROBUST DISTANT-TALKING SPEECH RECOGNITION
    Sehr, Armin
    Maas, Roland
    Kellermann, Walter
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4298 - 4301
  • [2] Distant-talking Continuous Speech Recognition based on a novel Reverberation Model in the Feature Domain
    Sehr, Armin
    Zeller, Marcus
    Kellermann, Walter
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 769 - 772
  • [3] Robust distant-talking speech recognition
    Lin, Q
    Che, C
    Yuk, DS
    Jin, L
    deVries, B
    Pearson, J
    Flanagan, J
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 21 - 24
  • [4] Phase and reverberation aware DNN for distant-talking speech enhancement
    Oo, Zeyan
    Wang, Longbiao
    Phapatanaburi, Khomdet
    Iwahashi, Masahiro
    Nakagawa, Seiichi
    Dang, Jianwu
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (14) : 18865 - 18880
  • [5] Phase and reverberation aware DNN for distant-talking speech enhancement
    Zeyan Oo
    Longbiao Wang
    Khomdet Phapatanaburi
    Masahiro Iwahashi
    Seiichi Nakagawa
    Jianwu Dang
    Multimedia Tools and Applications, 2018, 77 : 18865 - 18880
  • [6] Investigations into Early and Late Reflections on Distant-Talking Speech Recognition Toward Suitable Reverberation Criteria
    Nishiura, Takanobu
    Hirano, Yoshiki
    Denda, Yuki
    Nakayama, Masato
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1369 - 1372
  • [7] Improved HMM separation for distant-talking speech recognition
    Takiguchi, T
    Nishimura, M
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1127 - 1137
  • [8] Hidden Markov model training with contaminated speech material for distant-talking speech recognition
    Matassoni, M
    Omologo, M
    Giuliani, D
    Svaizer, P
    COMPUTER SPEECH AND LANGUAGE, 2002, 16 (02): : 205 - 223
  • [9] JOINT SPARSE REPRESENTATION BASED CEPSTRAL-DOMAIN DEREVERBERATION FOR DISTANT-TALKING SPEECH RECOGNITION
    Li, Weifeng
    Wang, Longbiao
    Zhou, Fei
    Liao, Qingmin
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7117 - 7120
  • [10] A reverberation robust target speech detection method using dual-microphone in distant-talking scene
    Wang, Xiaofei
    Guo, Yanmeng
    Wu, Chao
    Fu, Qiang
    Yan, Yonghong
    SPEECH COMMUNICATION, 2015, 72 : 47 - 58