Robust Voice Activity Detector for Real World Applications Using Harmonicity and Modulation frequency

被引:0
|
作者
Chuangsuwanich, Ekapol [1 ]
Glass, James [1 ]
机构
[1] MIT Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
关键词
voice activity detection; modulation frequency; harmonicity; human-robot interaction; SPEECH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of robustly detecting distant speech in low SNR environments for automatic speech recognition is examined using a two-stage approach based on two distinguishing features of speech, namely harmonicity and modulation frequency (MF). A modified metric for harmonicity is used as a gating function to a set of parallel classifiers that incorporate MFs computed on different frequency bands. Performance is evaluated on both the frame-level discriminative power and also the system level ASR results on a real-world robotic forklift task. Compared to other previously proposed features such as relative spectral entropy, and classification strategies involving MFs, the combined approach shows good generalization across different kinds of dynamic noise conditions, and obtains a significant improvement on the false alarm rate at low speech miss rate settings. The overall ASR results also improved significantly compared to the ESTI AMR-VAD2, while reducing the number of false alarms by a factor of two.
引用
收藏
页码:2656 / 2659
页数:4
相关论文
共 50 条
  • [31] Robust Voice Activity Detection Using Gammatone Filtering and Entropy
    Ong, W. Q.
    Tan, A. W. C.
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON ROBOTICS, AUTOMATION AND SCIENCES (ICORAS 2016), 2016,
  • [32] Robust Voice Activity Detection Using Selectively Energy Features
    Wakasugi, Junichiro
    Hayasaka, Noboru
    Iiguni, Youji
    2014 21ST IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS), 2014, : 359 - 362
  • [33] Real-time Voice Activity Detector Using Gammatone Filter and Modified Long-Term Signal Variability
    Ong, Wei Qing
    Tan, Alan Wee Chiat
    Vengadasalam, V. Vijayakumar A. L.
    Tan, Cheah Heng
    Ooi, Thean Hai
    PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON IMAGING, SIGNAL PROCESSING AND COMMUNICATION, 2015, : 113 - 117
  • [34] Noise estimation using negentropy based voice-activity detector
    Prasad, R
    Saruwatari, H
    Shikano, K
    2004 47TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, CONFERENCE PROCEEDINGS, 2004, : 149 - 152
  • [35] A Voice Activity Detector using SVM and Naive Bayes Classification Algorithm
    Selvakumari, N. A. Sheela
    Radha, V.
    PROCEEDINGS OF 2017 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICSPC'17), 2017, : 1 - 6
  • [36] Robust Voice Activity Detection Based on LSTM Recurrent Neural Networks and Modulation Spectrum
    Sertsi, Phuttapong
    Boonkla, Surasak
    Chunwijitra, Vataya
    Kurpukdee, Nattapong
    Wutiwiwatchai, Chai
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 342 - 346
  • [37] Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network
    Li, Nan
    Wang, Longbiao
    Ge, Meng
    Unoki, Masashi
    Li, Sheng
    Dang, Jianwu
    SPEECH COMMUNICATION, 2024, 157
  • [38] Noise-Robust Voice Activity Detector Based On Four States-Based HMM
    Zhou, Bin
    Liu, Jing
    Pei, Zheng
    INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY II, PTS 1-4, 2013, 411-414 : 743 - 748
  • [39] Robust Vision Technology of Intelligent Systems for Real-world Applications
    权仁昭
    重庆理工大学学报(自然科学), 2016, 30 (09) : 2 - 2
  • [40] Robust holographic imaging for real-world applications with joint optimization
    Zhang, Yunping
    Lam, Edmund Y.
    OPTICS EXPRESS, 2025, 33 (03): : 5932 - 5944