A DUAL-STAGED CONTEXT AGGREGATION METHOD TOWARDS EFFICIENT END-TO-END SPEECH ENHANCEMENT

被引：0

作者：

Zhen, Kai ^{[1
,2
]}

Lee, Mi Suk ^{[3
]}

Kim, Minje ^{[1
,2
]}

机构：

[1] Indiana Univ, Luddy Sch Informat Comp & Engn, Bloomington, IN 47405 USA

[2] Indiana Univ, Cognit Sci Program, Bloomington, IN 47405 USA

[3] Elect & Telecommun Res Inst, Daejeon, South Korea

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

关键词：

End-to-end; speech enhancement; context aggregation; residual learning; dilated convolution; recurrent network; NOISE;

D O I：

10.1109/icassp40776.2020.9054499

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In speech enhancement, an end-to-end deep neural network converts a noisy speech signal to a clean speech directly in the time domain without time-frequency transformation or mask estimation. However, aggregating contextual information from a high-resolution time domain signal with an affordable model complexity still remains challenging. In this paper, we propose a densely connected convolutional and recurrent network (DCCRN), a hybrid architecture, to enable dual-staged temporal context aggregation. With the dense connectivity and cross-component identical shortcut, DCCRN consistently outperforms competing convolutional baselines with an average STOI improvement of 0.23 and PESQ of 1.38 at three SNR levels. The proposed method is computationally efficient with only 1.38 million parameters. The generalizability performance on the unseen noise types is still decent considering its low complexity, although it is relatively weaker comparing to Wave-U-Net with 7.25 times more parameters.

引用

页码：366 / 370

页数：5

共 50 条

[41] End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
Chang, Xuankai
Maekaku, Takashi
Fujita, Yuya
Watanabe, Shinji
INTERSPEECH 2022, 2022, : 3819 - 3823
[42] An end-to-end perceptual enhancement method for UHD portrait images
Yang, Ying
Yang, Mengning
Zhang, Xin
IET IMAGE PROCESSING, 2022, 16 (07) : 1988 - 2000
[43] Beyond Sentence-Level End-to-End Speech Translation: Context Helps
Zhang, Biao
Titov, Ivan
Haddow, Barry
Sennrich, Rico
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2566 - 2578
[44] Transformer-based Long-context End-to-end Speech Recognition
Hori, Takaaki
Moritz, Niko
Hori, Chiori
Le Roux, Jonathan
INTERSPEECH 2020, 2020, : 5011 - 5015
[45] Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion
Kim, Suyoun
Dalmia, Siddharth
Metze, Florian
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1131 - 1141
[46] STREAMING END-TO-END SPEECH RECOGNITION WITH JOINTLY TRAINED NEURAL FEATURE ENHANCEMENT
Kim, Chanwoo
Garg, Abhinav
Gowda, Dhananjaya
Mun, Seongkyu
Han, Changwoo
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6773 - 6777
[47] End-to-End Speech Enhancement Using Fully Convolutional Networks with Skip Connections
Wang, Dujuan
Bao, Changchun
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 890 - 895
[48] NEURAL NOISE EMBEDDING FOR END-TO-END SPEECH ENHANCEMENT WITH CONDITIONAL LAYER NORMALIZATION
Zhang, Zhihui
Li, Xiaoqi
Li, Yaxing
Dong, Yuanjie
Wang, Dan
Xiong, Shengwu
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7113 - 7117
[49] Perception-guided generative adversarial network for end-to-end speech enhancement
Li, Yihao
Sun, Meng
Zhang, Xiongwei
APPLIED SOFT COMPUTING, 2022, 128
[50] A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network Speech Enhancement
Borgstrom, Bengt J.
Brandstein, Michael S.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2418 - 2431

← 1 2 3 4 5 →