Parameter Tuning-Free Missing-Feature Reconstruction for Robust Sound Recognition

被引:4
|
作者
Liu, Qi [1 ]
Wu, Jibin [1 ]
机构
[1] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore
基金
新加坡国家研究基金会;
关键词
Spectrogram; Matrix decomposition; Acoustics; Task analysis; Computational modeling; Speech recognition; Tuning; Missing-feature reconstruction; matrix factorization; deep neural networks (DNNs); automatic speech recognition (ASR); environmental sound classification; AUTOMATIC SPEECH RECOGNITION; MATRIX COMPLETION; FEATURE-EXTRACTION; ALGORITHM; RECOVERY; OPTIMIZATION;
D O I
10.1109/JSTSP.2020.3038054
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the advent of the deep neural network, automatic speech recognition (ASR) has seen significant improvements in recent years. However, ASR performance degrades rapidly when the acoustic environment, such as communication channels or noise backgrounds, differ from those of training data. In the missing feature approach to speech processing, the unreliable feature components are identified and reconstructed to overcome signal degradation and the mismatch of the acoustic environment. To reduce the model dependency, we investigate the matrix completion technique in missing feature reconstruction tasks. However, most of the matrix completion techniques require a priori tuning parameters, e.g., target rank, which is hard to determine in practice. In this work, we propose a matrix completion method based on matrix factorization for the missing-feature reconstruction task, that does not require model training nor parameter tuning. Experiments show superior feature reconstruction performance and computational efficiency in both speech recognition and environmental sound classification tasks.
引用
收藏
页码:78 / 89
页数:12
相关论文
共 50 条
  • [31] A Novel Mask Estimation Method Employing Posterior-Based Representative Mean Estimate for Missing-Feature Speech Recognition
    Kim, Wooil
    Hansen, John H. L.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1434 - 1443
  • [32] Selective Gammatone Envelope Feature for Robust Sound Event Recognition
    Leng, Yi Ren
    Huy Dat Tran
    Kitaoka, Norihide
    Li, Haizhou
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (05): : 1229 - 1237
  • [33] Selective Gammatone Filterbank Feature for Robust Sound Event Recognition
    Leng, Yi Ren
    Huy Dat Tran
    Kitaoka, Norihide
    Li, Haizhou
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2246 - +
  • [34] Hard-Mask Missing Feature Theory for Robust Speaker Recognition
    Lim, Shin-Cheol
    Jang, Sei-Jin
    Lee, Soek-Pil
    Kim, Moo Young
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2011, 57 (03) : 1245 - 1250
  • [35] A proximal-proximal majorization-minimization algorithm for nonconvex tuning-free robust regression problems
    Tang, Peipei
    Wang, Chengjing
    Jiang, Bo
    arXiv, 2021,
  • [36] Local Graph Reconstruction for Parameter Free Unsupervised Feature Selection
    Du, Liang
    Ren, Chaohong
    Lv, Xiaolin
    Chen, Yan
    Zhou, Peng
    Hu, Zhiguo
    IEEE ACCESS, 2019, 7 : 102921 - 102930
  • [37] Reinforcement Learning Based Tuning-free Plug-and-Play Image Reconstruction Method for Single Photon Imaging
    Chen, Shuang
    Tian, Ye
    Fu, Ying
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (10): : 3600 - 3612
  • [38] Robust speaker identification using combined feature selection and missing data recognition
    Pullella, Daniel
    Kuehne, Marco
    Togneri, Roberto
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4833 - 4836
  • [39] Missing feature theory applied to robust speech recognition over IP network
    Endo, T
    Kuroiwa, S
    Nakamura, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1119 - 1126
  • [40] Feature classification criterion for missing features mask estimation in robust speaker recognition
    Ribas Gonzalez, Dayana
    Calvo de Lara, Jose Ramon
    SIGNAL IMAGE AND VIDEO PROCESSING, 2014, 8 (02) : 365 - 375