TIME-FREQUENCY MASKING-BASED SPEECH ENHANCEMENT USING GENERATIVE ADVERSARIAL NETWORK

被引:0
|
作者
Soni, Meet H. [1 ]
Shah, Neil [1 ]
Patil, Hemant A. [1 ]
机构
[1] Dhirubhai Ambani Inst Informat & Commun Technol, Gandhinagar, India
关键词
Task-dependent masking; speech enhancement; generative adversarial networks;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The success of time-frequency (T-F) mask-based approaches is dependent on the accuracy of predicted mask given the noisy spectral features. The state-of-the-art methods in T-F masking-based enhancement employ Deep Neural Network (DNN) to predict mask. Recently, Generative Adversarial Networks (GAN) are gaining popularity instead of maximum likelihood (ML)-based optimization of deep learning architectures. In this paper, we propose to exploit GAN in TF masking-based enhancement framework. We present the viable strategy to use GAN in such application by modifying the existing approach. To achieve this, we use a method that learns the mask implicitly while predicting the clean TF representation. Moreover, we show the failure of vanilla GAN in predicting the accurate mask and propose a regularized objective function with the use of Mean Square Error (MSE) between predicted and target spectrum to overcome it. The objective evaluation of the proposed method shows the improvement in the accurate mask prediction, as against the state-of-the-art ML-based optimization techniques. The proposed system significantly improves over a recent GAN-based speech enhancement system in improving speech quality, while maintaining a better trade-off between less speech distortion and more effective removal of background interferences present in the noisy mixture.
引用
收藏
页码:5039 / 5043
页数:5
相关论文
共 50 条
  • [41] Time-Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks
    Chakrabarty, Soumitro
    Habets, Emanuel A. P.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 787 - 799
  • [42] A Loss With Mixed Penalty for Speech Enhancement Generative Adversarial Network
    Cao, Jie
    Zhou, Yaofeng
    Yu, Hong
    Li, Xiaoxu
    Wang, Dan
    Ma, Zhanyu
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 86 - 90
  • [43] DENSELY CONNECTED NETWORK WITH TIME-FREQUENCY DILATED CONVOLUTION FOR SPEECH ENHANCEMENT
    Li, Yaxing
    Li, Xiaoqi
    Dong, Yuanjie
    Li, Meng
    Xu, Shan
    Xiong, Shengwu
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6860 - 6864
  • [44] Noise estimation based on time-frequency correlation for speech enhancement
    Yuan, Wenhao
    Lin, Jiajun
    An, Wei
    Wang, Yu
    Chen, Ning
    APPLIED ACOUSTICS, 2013, 74 (05) : 770 - 781
  • [45] Speech Dereverberation Based on Generative Adversarial Network with Additive Frequency Domain Decomposition
    Quan H.
    Wang T.
    Zheng Z.
    Gongcheng Kexue Yu Jishu/Advanced Engineering Sciences, 2022, 54 (02): : 180 - 187
  • [46] Maximizing environmental sound recognition and speech intelligibility using time-frequency masking
    Johnson, Eric M.
    Healy, Eric W.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
  • [47] On-line Speech Enhancement by Time-Frequency Masking under Prior Knowledge of Source Location
    Kang, Min Ah
    Jeong, Sangbae
    Hahn, Minsoo
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 25, 2007, 25 : 116 - 121
  • [48] Speech endpoint detection based on speech time-frequency enhancement and spectral entropy
    Fan Yingle
    Li Yi
    Wu Chuanyan
    2005 27TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2005, : 4682 - 4684
  • [49] Speech Enhancement Using Generative Adversarial Network by Distilling Knowledge from Statistical Method
    Wu, Jianfeng
    Hua, Yongzhu
    Yang, Shengying
    Qin, Hongshuai
    Qin, Huibin
    APPLIED SCIENCES-BASEL, 2019, 9 (16):
  • [50] TIME-FREQUENCY ATTENTION FOR MONAURAL SPEECH ENHANCEMENT
    Zhang, Qiquan
    Song, Qi
    Ni, Zhaoheng
    Nicolson, Aaron
    Li, Haizhou
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7852 - 7856