Parameter Tuning-Free Missing-Feature Reconstruction for Robust Sound Recognition

被引:4
|
作者
Liu, Qi [1 ]
Wu, Jibin [1 ]
机构
[1] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore
基金
新加坡国家研究基金会;
关键词
Spectrogram; Matrix decomposition; Acoustics; Task analysis; Computational modeling; Speech recognition; Tuning; Missing-feature reconstruction; matrix factorization; deep neural networks (DNNs); automatic speech recognition (ASR); environmental sound classification; AUTOMATIC SPEECH RECOGNITION; MATRIX COMPLETION; FEATURE-EXTRACTION; ALGORITHM; RECOVERY; OPTIMIZATION;
D O I
10.1109/JSTSP.2020.3038054
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the advent of the deep neural network, automatic speech recognition (ASR) has seen significant improvements in recent years. However, ASR performance degrades rapidly when the acoustic environment, such as communication channels or noise backgrounds, differ from those of training data. In the missing feature approach to speech processing, the unreliable feature components are identified and reconstructed to overcome signal degradation and the mismatch of the acoustic environment. To reduce the model dependency, we investigate the matrix completion technique in missing feature reconstruction tasks. However, most of the matrix completion techniques require a priori tuning parameters, e.g., target rank, which is hard to determine in practice. In this work, we propose a matrix completion method based on matrix factorization for the missing-feature reconstruction task, that does not require model training nor parameter tuning. Experiments show superior feature reconstruction performance and computational efficiency in both speech recognition and environmental sound classification tasks.
引用
收藏
页码:78 / 89
页数:12
相关论文
共 50 条
  • [1] Mask classification for missing-feature reconstruction for robust speech recognition in unknown background noise
    Kim, Wooil
    Stern, Richard M.
    SPEECH COMMUNICATION, 2011, 53 (01) : 1 - 11
  • [2] MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition
    Gonzalez, Jose A.
    Peinado, Antonio M.
    Ma, Ning
    Gomez, Angel M.
    Barker, Jon
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03): : 624 - 635
  • [3] Missing-feature approaches in speech recognition
    Raj, B
    Stern, RM
    IEEE SIGNAL PROCESSING MAGAZINE, 2005, 22 (05) : 101 - 116
  • [4] Missing-Feature Reconstruction by Leveraging Temporal Spectral Correlation for Robust Speech Recognition in Background Noise Conditions
    Kim, Wooil
    Hansen, John H. L.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08): : 2111 - 2120
  • [5] Adaptive Speech Model for Missing-Feature Reconstruction
    Viana, Hesdras O.
    Araujo, Aluizio F. R.
    2016 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2016, : 104 - 108
  • [6] Missing-Feature Reconstruction for Band-Limited Speech Recognition in Spoken Document Retrieval
    Kim, Wooil
    Hansen, John H. L.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2306 - 2309
  • [7] Time-Frequency Correlation-Based Missing-Feature Reconstruction for Robust Speech Recognition in Band-Restricted Conditions
    Kim, Wooil
    Hansen, John H. L.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (07): : 1292 - 1304
  • [8] Robust convex biclustering with a tuning-free method
    Chen, Yifan
    Lei, Chunyin
    Li, Chuanquan
    Ma, Haiqiang
    Hu, Ningyuan
    JOURNAL OF APPLIED STATISTICS, 2025, 52 (02) : 271 - 286
  • [9] Missing-Feature Reconstruction With a Bounded Nonlinear State-Space Model
    Remes, Ulpu
    Palomaki, Kalle J.
    Raiko, Tapani
    Honkela, Antti
    Kurimo, Mikko
    IEEE SIGNAL PROCESSING LETTERS, 2011, 18 (10) : 563 - 566
  • [10] Missing-Feature Method for Speaker Recognition in Band-Restricted Conditions
    Kim, Wooil
    Hansen, John H. L.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1909 - 1912