A DUAL-STAGED CONTEXT AGGREGATION METHOD TOWARDS EFFICIENT END-TO-END SPEECH ENHANCEMENT

被引：0

作者：

Zhen, Kai ^{[1
,2
]}

Lee, Mi Suk ^{[3
]}

Kim, Minje ^{[1
,2
]}

机构：

[1] Indiana Univ, Luddy Sch Informat Comp & Engn, Bloomington, IN 47405 USA

[2] Indiana Univ, Cognit Sci Program, Bloomington, IN 47405 USA

[3] Elect & Telecommun Res Inst, Daejeon, South Korea

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

关键词：

End-to-end; speech enhancement; context aggregation; residual learning; dilated convolution; recurrent network; NOISE;

D O I：

10.1109/icassp40776.2020.9054499

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In speech enhancement, an end-to-end deep neural network converts a noisy speech signal to a clean speech directly in the time domain without time-frequency transformation or mask estimation. However, aggregating contextual information from a high-resolution time domain signal with an affordable model complexity still remains challenging. In this paper, we propose a densely connected convolutional and recurrent network (DCCRN), a hybrid architecture, to enable dual-staged temporal context aggregation. With the dense connectivity and cross-component identical shortcut, DCCRN consistently outperforms competing convolutional baselines with an average STOI improvement of 0.23 and PESQ of 1.38 at three SNR levels. The proposed method is computationally efficient with only 1.38 million parameters. The generalizability performance on the unseen noise types is still decent considering its low complexity, although it is relatively weaker comparing to Wave-U-Net with 7.25 times more parameters.

引用

页码：366 / 370

页数：5

共 50 条

[21] UNIFIED END-TO-END SPEECH RECOGNITION AND ENDPOINTING FOR FAST AND EFFICIENT SPEECH SYSTEMS
Bijwadia, Shaan
Chang, Shuo-yiin
Li, Bo
Sainath, Tara
Zhang, Chao
He, Yanzhang
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 310 - 316
[22] Do End-to-End Speech Recognition Models Care About Context?
Borgholt, Lasse
Havtorn, Jakob D.
Agic, Zeljko
Sogaard, Anders
Maaloe, Lars
Igel, Christian
INTERSPEECH 2020, 2020, : 4352 - 4356
[23] Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis
Yang, Fengyu
Yang, Shan
Wu, Qinghua
Wang, Yujun
Xie, Lei
INTERSPEECH 2020, 2020, : 3436 - 3440
[24] Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition
Liu, Bin
Nie, Shuai
Liang, Shan
Liu, Wenju
Yu, Meng
Chen, Lianwu
Peng, Shouye
Li, Changliang
INTERSPEECH 2019, 2019, : 491 - 495
[25] Towards a Method for end-to-end SDN App Development
Stritzke, Christian
Priesterjahn, Claudia
Aranda Gutierrez, Pedro A.
2015 FOURTH EUROPEAN WORKSHOP ON SOFTWARE DEFINED NETWORKS - EWSDN 2015, 2015, : 107 - 108
[26] Towards Paralinguistic-Only Speech Representations for End-to-End Speech Emotion Recognition
Ioannides, Georgios
Owen, Michael
Fletcher, Andrew
Rozgic, Viktor
Wang, Chao
INTERSPEECH 2023, 2023, : 1853 - 1857
[27] A Flow Aggregation Method Based on End-to-End Delay in SDN
Kosugiyama, Takuya
Tanabe, Kazuki
Nakayama, Hiroki
Hayashi, Tsunemasa
Yamaoka, Katsunori
2017 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2017,
[28] Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
Zhang, Ying
Pezeshki, Mohammad
Brakel, Philemon
Zhang, Saizheng
Laurent, Cesar
Bengio, Yoshua
Courville, Aaron
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 410 - 414
[29] Exploring end-to-end framework towards Khasi speech recognition system
Bronson Syiem
L. Joyprakash Singh
International Journal of Speech Technology, 2021, 24 : 419 - 424
[30] TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition
Yoon, Ji Won
Lee, Hyeonseung
Kim, Hyung Yong
Cho, Won Ik
Kim, Nam Soo
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 (29) : 1626 - 1638

← 1 2 3 4 5 →