Time-regularized linear prediction for noise-robust extraction of the spectral envelope of speech

被引:5
|
作者
Airaksinen, Manu [1 ]
Juvela, Lauri [1 ]
Rasanen, Okka [1 ]
Alku, Paavo [1 ]
机构
[1] Aalto Univ, Espoo, Finland
基金
芬兰科学院;
关键词
speech analysis; linear prediction; robust features;
D O I
10.21437/Interspeech.2018-1230
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature extraction of speech signals is typically performed in short-time frames by assuming that the signal is stationary within each frame. For the extraction of the spectral envelope of speech, which conveys the formant frequencies produced by the resonances of the slowly varying vocal tract, an often used frame length is within 20-30 ms. However, this kind of conventional frame-based spectral analysis is oblivious of the broader temporal context of the signal and is prone to degradation by, for example, environmental noise. In this paper, we propose a new frame-based linear prediction (LP) analysis method that includes a regularization term that penalizes energy differences in consecutive frames of an all-pole spectral envelope model. This integrates the slowly varying nature of the vocal tract as a part of the analysis. Objective evaluations related to feature distortion and phonetic representational capability were performed by studying the properties of the mel-frequency cepstral coefficient (MFCC) representations computed from different spectral estimation methods under noisy conditions using the TIMIT database. The results show that the proposed time-regularized LP approach exhibits superior MFCC distortion behavior while simultaneously having the greatest average separability of different phoneme categories in comparison to the other methods.
引用
收藏
页码:701 / 705
页数:5
相关论文
共 50 条
  • [21] Extended VTS for Noise-Robust Speech Recognition
    van Dalen, Rogier C.
    Gales, Mark J. F.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 733 - 743
  • [22] Lyapunov exponents from a time series: A noise-robust extraction algorithm
    Banbrook, M
    Ushaw, G
    McLaughlin, S
    CHAOS SOLITONS & FRACTALS, 1996, 7 (07) : 973 - 976
  • [23] Unsupervised spectral subtraction for noise-robust ASR
    Lathoud, G
    Magimai-Doss, M
    Mesot, B
    Bourlard, H
    2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 343 - 348
  • [24] A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks
    Li, Bo
    Sim, Khe Chai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (08) : 1296 - 1305
  • [25] SPEECH TEMPORAL DYNAMICS FUSION APPROACHES FOR NOISE-ROBUST REVERBERATION TIME ESTIMATION
    Senoussaoui, Mohammed
    Santos, Joao F.
    Falk, Tiago H.
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5545 - 5549
  • [26] Unsupervised learning of time-frequency patches as a noise-robust representation of speech
    Van Segbroeck, Maarten
    Van Hamme, Hugo
    SPEECH COMMUNICATION, 2009, 51 (11) : 1124 - 1138
  • [27] Noisy Constrained Maximum-Likelihood Linear Regression for Noise-Robust Speech Recognition
    Kim, D. K.
    Gales, M. J. F.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (02): : 315 - 325
  • [28] Noise-robust parameter estimation of linear systems
    Emara-Shabaik, HE
    JOURNAL OF VIBRATION AND CONTROL, 2000, 6 (05) : 727 - 740
  • [29] An engineering model of the masking for the noise-robust speech recognition
    Park, KY
    Lee, SY
    NEUROCOMPUTING, 2003, 52-4 : 615 - 620
  • [30] Application of Slope Filtering to Robust Spectral Envelope Extraction for Speech/Speaker Recognition
    Drgas, Szymon
    Dabrowski, Adam
    HUMAN LANGUAGE TECHNOLOGY: CHALLENGES OF THE INFORMATION SOCIETY, 2009, 5603 : 13 - 23