DNN-Based Voice Activity Detection with Multi-Task Learning

被引:31
|
作者
Kang, Tae Gyoon [1 ,2 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 151742, South Korea
[2] Seoul Natl Univ, Inst New Media & Commun, Seoul 151742, South Korea
来源
基金
新加坡国家研究基金会;
关键词
deep neural network; voice activity detection; multi-task learning; NETWORKS;
D O I
10.1587/transinf.2015EDL8168
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, notable improvements in voice activity detection (VAD) problem have been achieved by adopting several machine learning techniques. Among them, the deep neural network (DNN) which learns the mapping between the noisy speech features and the corresponding voice activity status with its deep hidden structure has been one of the most popular techniques. In this letter, we propose a novel approach which enhances the robustness of DNN in mismatched noise conditions with multi-task learning (MTL) framework. In the proposed algorithm, a feature enhancement task for speech features is jointly trained with the conventional VAD task. The experimental results show that the DNN with the proposed framework outperforms the conventional DNN-based VAD algorithm.
引用
收藏
页码:550 / 553
页数:4
相关论文
共 50 条
  • [1] Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning
    Hu, Qiong
    Wu, Zhizheng
    Richmond, Korin
    Yamagishi, Junichi
    Stylianou, Yannis
    Maia, Ranniery
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 854 - 858
  • [2] Multi-task learning and Weighted Cross-entropy for DNN-based Keyword Spotting
    Panchapagesan, Sankaran
    Sun, Ming
    Khare, Aparna
    Mandal, Spyros Matsoukas Arindam
    Hoffineister, Bjorn
    Vitaladevuni, Shiv
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 760 - 764
  • [3] MULTI-TASK LEARNING FOR VOICE TRIGGER DETECTION
    Sigtia, Siddharth
    Clark, Pascal
    Haynes, Rob
    Richards, Hywel
    Bridle, John
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7449 - 7453
  • [4] Multi-Task Joint-Learning for Robust Voice Activity Detection
    Zhuang, Yimeng
    Tong, Sibo
    Yin, Maofan
    Qian, Yanmin
    Yu, Kai
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [5] WAVELET-BASED DECOMPOSITION OF F0 AS A SECONDARY TASK FOR DNN-BASED SPEECH SYNTHESIS WITH MULTI-TASK LEARNING
    Ribeiro, Manuel Sam
    Watts, Oliver
    Yamagishi, Junichi
    Clark, Robert A. J.
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5525 - 5529
  • [6] DNN-Based Voice Activity Detection with Local Feature Shift Technique
    Kang, Tae Gyoon
    Lee, Kang Hyun
    Kang, Woo Hyun
    Bae, Soo Hyun
    Kim, Nam Soo
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [7] VOICE TOXICITY DETECTION USING MULTI-TASK LEARNING
    Nandwana, Mahesh Kumar
    He, Yifan
    Liu, Joseph
    Yu, Xiao
    Shang, Charles
    Du Bois, Eloi
    McGuire, Morgan
    Bhat, Kiran
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 331 - 335
  • [8] Multi-lingual and Multi-task DNN Learning for Articulatory Error Detection
    Duan, Richeng
    Kawahara, Tatsuya
    Dantsuji, Masatake
    Zhang, Jinsong
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [9] MULTI-TASK LEARNING FOR SPEAKER VERIFICATION AND VOICE TRIGGER DETECTION
    Sigtia, Siddharth
    Marchi, Erik
    Kajarekar, Sachin
    Naik, Devang
    Bridle, John
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6844 - 6848
  • [10] DNN-BASED VOICE ACTIVITY DETECTION USING AUXILIARY SPEECH MODELS IN NOISY ENVIRONMENTS
    Tachioka, Yuuki
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5529 - 5533