Time-regularized linear prediction for noise-robust extraction of the spectral envelope of speech

被引：5

作者：

Airaksinen, Manu ^{[1
]}

Juvela, Lauri ^{[1
]}

Rasanen, Okka ^{[1
]}

Alku, Paavo ^{[1
]}

机构：

[1] Aalto Univ, Espoo, Finland

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

基金：

芬兰科学院;

关键词：

speech analysis; linear prediction; robust features;

D O I：

10.21437/Interspeech.2018-1230

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Feature extraction of speech signals is typically performed in short-time frames by assuming that the signal is stationary within each frame. For the extraction of the spectral envelope of speech, which conveys the formant frequencies produced by the resonances of the slowly varying vocal tract, an often used frame length is within 20-30 ms. However, this kind of conventional frame-based spectral analysis is oblivious of the broader temporal context of the signal and is prone to degradation by, for example, environmental noise. In this paper, we propose a new frame-based linear prediction (LP) analysis method that includes a regularization term that penalizes energy differences in consecutive frames of an all-pole spectral envelope model. This integrates the slowly varying nature of the vocal tract as a part of the analysis. Objective evaluations related to feature distortion and phonetic representational capability were performed by studying the properties of the mel-frequency cepstral coefficient (MFCC) representations computed from different spectral estimation methods under noisy conditions using the TIMIT database. The results show that the proposed time-regularized LP approach exhibits superior MFCC distortion behavior while simultaneously having the greatest average separability of different phoneme categories in comparison to the other methods.

引用

页码：701 / 705

页数：5

共 50 条

[1] Linear Prediction Filtering on Cepstral Time Series for Noise-Robust Speech Recognition
Hsieh, Hsin-Ju
Jheng, Jhih-Hao
Lin, Jung-shan
Hung, Jeih-weih
2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN (ICCE-TW), 2016, : 311 - 312
[2] An improved algorithm for noise-robust sparse linear prediction of speech
Zhou, Bin
Zou, Xia
Zhang, Xiongwei
Shengxue Xuebao/Acta Acustica, 2014, 39 (05): : 655 - 662
[3] An improved algorithm for noise-robust sparse linear prediction of speech
ZHOU Bin
ZOU Xia
ZHANG Xiongwei
ChineseJournalofAcoustics, 2015, 34 (01) : 84 - 95
[4] Speech Envelope Dynamics for Noise-Robust Auditory Scene Analysis in Robotics
Rea, Francesco
Kothig, Austin
Grasse, Lukas
Tata, Matthew
INTERNATIONAL JOURNAL OF HUMANOID ROBOTICS, 2020, 17 (06)
[5] Noise-robust speech triage
Bartos, Anthony L.
Cipr, Tomas
Nelson, Douglas J.
Schwarz, Petr
Banowetz, John
Jerabek, Ladislav
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 143 (04): : 2313 - 2320
[6] Noise-Robust Algorithm of Speech Features Extraction for Automatic Speech Recognition System
Yakhnev, A. N.
Pisarev, A. S.
PROCEEDINGS OF THE XIX IEEE INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND MEASUREMENTS (SCM 2016), 2016, : 206 - 208
[7] Use of spectral autocorrelation in spectral envelope linear prediction for speech recognition
Kim, HK
Lee, HS
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (05): : 533 - 541
[8] Noise-Robust speech recognition of Conversational Telephone Speech
Chen, Gang
Tolba, Hesham
O'Shaughnessy, Douglas
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1101 - 1104
[9] Direct control on modulation spectrum for noise-robust speech recognition and spectral subtraction
Wada, Naoya
Hayasaka, Noboru
Yoshizawa, Shingo
Miyanaga, Yoshikazu
2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, : 2533 - +
[10] Noise-robust speech recognition using a new spectral estimation method "PHASOR"
Aikawa, K
Ishizuka, K
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 397 - 400

← 1 2 3 4 5 →