Multi-task Learning for End-to-end Noise-robust Bandwidth Extension

被引:11
|
作者
Hou, Nana [1 ]
Xu, Chenglin [1 ,4 ]
Zhou, Joey Tianyi [3 ]
Chng, Eng Siong [1 ,2 ]
Li, Haizhou [4 ,5 ]
机构
[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[2] Nanyang Technol Univ, Temasek Labs, Singapore, Singapore
[3] ASTAR, Inst High Performance Comp IHPC, Singapore, Singapore
[4] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
[5] Univ Bremen, Machine Listening Lab, Bremen, Germany
来源
基金
新加坡国家研究基金会;
关键词
Noise-robust bandwidth extension; multi-task learning; time-domain masking; temporal convolutional network; NEURAL-NETWORK; SPEECH;
D O I
10.21437/Interspeech.2020-2022
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Bandwidth extension aims to reconstruct wideband speech signals from narrowband inputs to improve perceptual quality. Prior studies mostly perform bandwidth extension under the assumption that the narrowband signals are clean without noise. The use of such extension techniques is greatly limited in practice when signals are corrupted by noise. To alleviate such problem, we propose an end-to-end time-domain framework for noise-robust bandwidth extension, that jointly optimizes a mask-based speech enhancement and an ideal bandwidth extension module with multi-task learning. The proposed framework avoids decomposing the signals into magnitude and phase spectra, therefore, requires no phase estimation. Experimental results show that the proposed method achieves 14.3% and 15.8% relative improvements over the best baseline in terms of perceptual evaluation of speech quality (PESQ) and log-spectral distortion (LSD), respectively. Furthermore, our method is 3 times more compact than the best baseline in terms of the number of parameters.
引用
收藏
页码:4069 / 4073
页数:5
相关论文
共 50 条
  • [21] MULTI-TASK AUTOENCODER FOR NOISE-ROBUST SPEECH RECOGNITION
    Zhang, Haoyi
    Liu, Conggui
    Inoue, Nakamasa
    Shinoda, Koichi
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5599 - 5603
  • [22] End-to-End Multi-task Learning Regression Network for Fovea Localization in Fundus Images
    Huang, Limin
    Lei, Haijun
    Liu, Weixin
    Li, Zhen
    Xie, Hai
    Lei, Baiying
    2022 IEEE 35TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2022, : 389 - 393
  • [23] Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction
    Qiu, David
    He, Yanzhang
    Li, Qiujia
    Zhang, Yu
    Gao, Liangliang
    McGraw, Ian
    INTERSPEECH 2021, 2021, : 4074 - 4078
  • [24] SPEECH ENHANCEMENT AIDED END-TO-END MULTI-TASK LEARNING FOR VOICE ACTIVITY DETECTION
    Tan, Xu
    Zhang, Xiao-Lei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6823 - 6827
  • [25] End-to-end aspect-based sentiment analysis with hierarchical multi-task learning
    Wang, Xinyi
    Xu, Guangluan
    Zhang, Zequn
    Jin, Li
    Sun, Xian
    NEUROCOMPUTING, 2021, 455 : 178 - 188
  • [26] End-to-End Speech Translation With Transcoding by Multi-Task Learning for Distant Language Pairs
    Kano, Takatomo
    Sakti, Sakriani
    Nakamura, Satoshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1342 - 1355
  • [27] Towards end-to-end Cyberthreat Detection from Twitter using Multi-Task Learning
    Dionisio, Nuno
    Alves, Fernando
    Ferreira, Pedro M.
    Bessani, Alysson
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [28] Multi-objective optimization based multi-task learning for end-to-end license plates recognition
    Zhou X.-J.
    Gao Y.
    Li C.-J.
    Yang C.-H.
    Kongzhi Lilun Yu Yingyong/Control Theory and Applications, 2021, 38 (05): : 676 - 688
  • [29] End-to-end dialogue structure parsing on multi-floor dialogue based on multi-task learning
    Kawano, Seiya
    Yoshino, Koichiro
    Traum, David
    Nakamura, Satoshi
    FRONTIERS IN ROBOTICS AND AI, 2023, 10
  • [30] End-to-end Japanese Multi-dialect Speech Recognition and Dialect Identification with Multi-task Learning
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)