A DUAL-STAGED CONTEXT AGGREGATION METHOD TOWARDS EFFICIENT END-TO-END SPEECH ENHANCEMENT

被引:0
|
作者
Zhen, Kai [1 ,2 ]
Lee, Mi Suk [3 ]
Kim, Minje [1 ,2 ]
机构
[1] Indiana Univ, Luddy Sch Informat Comp & Engn, Bloomington, IN 47405 USA
[2] Indiana Univ, Cognit Sci Program, Bloomington, IN 47405 USA
[3] Elect & Telecommun Res Inst, Daejeon, South Korea
关键词
End-to-end; speech enhancement; context aggregation; residual learning; dilated convolution; recurrent network; NOISE;
D O I
10.1109/icassp40776.2020.9054499
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In speech enhancement, an end-to-end deep neural network converts a noisy speech signal to a clean speech directly in the time domain without time-frequency transformation or mask estimation. However, aggregating contextual information from a high-resolution time domain signal with an affordable model complexity still remains challenging. In this paper, we propose a densely connected convolutional and recurrent network (DCCRN), a hybrid architecture, to enable dual-staged temporal context aggregation. With the dense connectivity and cross-component identical shortcut, DCCRN consistently outperforms competing convolutional baselines with an average STOI improvement of 0.23 and PESQ of 1.38 at three SNR levels. The proposed method is computationally efficient with only 1.38 million parameters. The generalizability performance on the unseen noise types is still decent considering its low complexity, although it is relatively weaker comparing to Wave-U-Net with 7.25 times more parameters.
引用
收藏
页码:366 / 370
页数:5
相关论文
共 50 条
  • [41] End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
    Chang, Xuankai
    Maekaku, Takashi
    Fujita, Yuya
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 3819 - 3823
  • [42] An end-to-end perceptual enhancement method for UHD portrait images
    Yang, Ying
    Yang, Mengning
    Zhang, Xin
    IET IMAGE PROCESSING, 2022, 16 (07) : 1988 - 2000
  • [43] Beyond Sentence-Level End-to-End Speech Translation: Context Helps
    Zhang, Biao
    Titov, Ivan
    Haddow, Barry
    Sennrich, Rico
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2566 - 2578
  • [44] Transformer-based Long-context End-to-end Speech Recognition
    Hori, Takaaki
    Moritz, Niko
    Hori, Chiori
    Le Roux, Jonathan
    INTERSPEECH 2020, 2020, : 5011 - 5015
  • [45] Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion
    Kim, Suyoun
    Dalmia, Siddharth
    Metze, Florian
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1131 - 1141
  • [46] STREAMING END-TO-END SPEECH RECOGNITION WITH JOINTLY TRAINED NEURAL FEATURE ENHANCEMENT
    Kim, Chanwoo
    Garg, Abhinav
    Gowda, Dhananjaya
    Mun, Seongkyu
    Han, Changwoo
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6773 - 6777
  • [47] End-to-End Speech Enhancement Using Fully Convolutional Networks with Skip Connections
    Wang, Dujuan
    Bao, Changchun
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 890 - 895
  • [48] NEURAL NOISE EMBEDDING FOR END-TO-END SPEECH ENHANCEMENT WITH CONDITIONAL LAYER NORMALIZATION
    Zhang, Zhihui
    Li, Xiaoqi
    Li, Yaxing
    Dong, Yuanjie
    Wang, Dan
    Xiong, Shengwu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7113 - 7117
  • [49] Perception-guided generative adversarial network for end-to-end speech enhancement
    Li, Yihao
    Sun, Meng
    Zhang, Xiongwei
    APPLIED SOFT COMPUTING, 2022, 128
  • [50] A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network Speech Enhancement
    Borgstrom, Bengt J.
    Brandstein, Michael S.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2418 - 2431