Modelling non-stationary noise with spectral factorisation in automatic speech recognition

被引:16
|
作者
Hurmalainen, Antti [1 ]
Gemmeke, Jort F. [2 ]
Virtanen, Tuomas [1 ]
机构
[1] Tampere Univ Technol, Dept Signal Proc, FI-33101 Tampere, Finland
[2] Katholieke Univ Leuven, Dept ESAT PSI, B-3001 Louvain, Belgium
来源
COMPUTER SPEECH AND LANGUAGE | 2013年 / 27卷 / 03期
基金
芬兰科学院;
关键词
Automatic speech recognition; Noise robustness; Non-stationary noise; Non-negative spectral factorisation; Exemplar-based; NONNEGATIVE MATRIX FACTORIZATION; SEPARATION; ALGORITHMS;
D O I
10.1016/j.csl.2012.07.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition systems intended for everyday use must be able to cope with a large variety of noise types and levels, including highly non-stationary multi-source mixtures. This study applies spectral factorisation algorithms and long temporal context for separating speech and noise from mixed signals. To adapt the system to varying environments, noise models are acquired from the context, or learnt from the mixture itself without prior information. We also propose methods for reducing the size of the bases used for speech and noise modelling by 20-40 times for better practical applicability. We evaluate the performance of the methods both as a standalone classifier and as a signal-enhancing front-end for external recognisers. For the CHiME noisy speech corpus containing non-stationary multi-source household noises at signal-to-noise ratios ranging from +9 to -6 dB, we report average keyword recognition rates up to 87.8% using a single-stream sparse classification algorithm. (c) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:763 / 779
页数:17
相关论文
共 50 条
  • [21] Markovian Segmentation of Non-stationary Data Corrupted by Non-stationary Noise
    Habbouchi, Ahmed
    Boudaren, Mohamed El Yazid
    Senouci, Mustapha Reda
    Aissani, Amar
    ADVANCES IN COMPUTING SYSTEMS AND APPLICATIONS, 2022, 513 : 27 - 37
  • [22] Speech detection in non-stationary noise based on the 1/f process
    Fan Wang
    Fang Zheng
    Wenhu Wu
    Journal of Computer Science and Technology, 2002, 17 : 83 - 89
  • [23] Single Channel Speech Enhancement for Mixed Non-stationary Noise Environments
    Singh, Sachin
    Tripathy, Manoj
    Anand, R. S.
    ADVANCES IN SIGNAL PROCESSING AND INTELLIGENT RECOGNITION SYSTEMS, 2014, 264 : 545 - 555
  • [24] Speech enhancement for non-stationary noise environment by adaptive wavelet packet
    Chang, S
    Kwon, Y
    Yang, SI
    Kim, IJ
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 561 - 564
  • [25] MODEL-BASED NOISE PSD ESTIMATION FROM SPEECH IN NON-STATIONARY NOISE
    Nielsen, Jesper Kjaer
    Kavalekalam, Mathew Shaji
    Christensen, Mads Graesboll
    Boldt, Jesper
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5424 - 5428
  • [26] Two Methods for Estimating Noise Amplitude Spectral in Non-Stationary Environments
    Ou, Shifeng
    Liu, Wei
    Shen, Suojin
    Gao, Ying
    2016 9TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2016), 2016, : 969 - 973
  • [27] Non-stationary additive noise modelling in direction-of-arrival estimation
    Gholipour, Atefeh
    Zakeri, Bijan
    Mafinezhad, Khalil
    IET COMMUNICATIONS, 2016, 10 (15) : 2054 - 2059
  • [28] NON-STATIONARY NOISE ESTIMATION METHOD BASED ON BIAS-RESIDUAL COMPONENT DECOMPOSITION FOR ROBUST SPEECH RECOGNITION
    Fujimoto, Masakiyo
    Watanabe, Shinji
    Nakatani, Tomohiro
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4816 - 4819
  • [29] SPARSE HMM-BASED SPEECH ENHANCEMENT METHOD FOR STATIONARY AND NON-STATIONARY NOISE ENVIRONMENTS
    Deng, Feng
    Bao, Chang-chun
    Kleijn, W. Bastiaan
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5073 - 5077
  • [30] PITCH ESTIMATION FOR NON-STATIONARY SPEECH
    Christensen, Mads Graesboll
    Jensen, Jesper Rindom
    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 1400 - 1404