Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

被引：0

作者：

Zhang, Wangyou ^{[1
,2
]}

Saijo, Kohei ^{[3
]}

Jung, Jee-weon ^{[2
]}

Li, Chenda ^{[1
,2
]}

Watanabe, Shinji ^{[2
]}

Qiani, Yanmin ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[3] Waseda Univ, Tokyo, Japan

来源：

INTERSPEECH 2024 | 2024年

基金：

美国国家科学基金会;

关键词：

speech enhancement; scalability; robustness; generalizability;

D O I：

10.21437/Interspeech.2024-1266

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning-based speech enhancement (SE) models have achieved impressive performance in the past decade. Numerous advanced architectures have been designed to deliver state-of-the-art performance; however, their scalability potential remains unrevealed. Meanwhile, the majority of research focuses on small-sized datasets with restricted diversity, leading to a plateau in performance improvement. In this paper, we aim to provide new insights for addressing the above issues by exploring the scalability of SE models in terms of architectures, model sizes, compute budgets, and dataset sizes. Our investigation involves several popular SE architectures and speech data from different domains. Experiments reveal both similarities and distinctions between the scaling effects in SE and other tasks such as speech recognition. These findings further provide insights into the under-explored SE directions, e.g., larger-scale multi-domain corpora and efficiently scalable architectures.

引用

页码：1740 / 1744

页数：5

共 50 条

[41] Performance Analysis of Various Single Channel Speech Enhancement Algorithms for Automatic Speech Recognition
Song, Myung-Suk
Lee, Chang-Heon
Kang, Hong-Goo
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1451 - 1454
[42] Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance
Ochiai, Tsubasa
Iwamoto, Kazuma
Delcroix, Marc
Ikeshita, Rintaro
Sato, Hiroshi
Araki, Shoko
Katagiri, Shigeru
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3589 - 3602
[43] Assessment of Correlation between Objective Measures and Speech Recognition Performance in the Evaluation of Speech Enhancement
Ding, Pei
Hao, Jie
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 179 - 182
[44] Web application performance assessment: A study of responsiveness, throughput, and scalability
Alnuhait, Hend
Alzyadat, Wael
Althunibat, Ahmad
Kahtan, Hasan
Zaqaibeh, Belal
Al-Khawaja, Haneen A.
INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2024, 11 (09): : 214 - 226
[45] Image Enhancement in Spatial Domain: A Comprehensive Study
Rahman, Shanto
Rahman, Md Mostafijur
Hussain, Khalid
Khaled, Shah Mostafa
Shoyaib, Mohammad
2014 17TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2014, : 368 - 373
[46] A Microphone Array Beamformer for the Performance Enhancement of Speech Recognizer in Car
Han, Chul-Hee
Kang, Hong-Goo
Hwang, Youngsoo
Youn, Dae-Hee
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2005, 24 (07): : 423 - 430
[47] Performance optimizations on U-Net speech enhancement models
Chee, Jerry
Braun, Sebastian
Gopal, Vishak
Cutler, Ross
2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2022,
[48] Improving the performance of DEMUCS in speech enhancement with the perceptual metric loss
Wu, Zong-Tai
Chen, Yon-Tong
Hung, Jeih-weih
2022 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN, IEEE ICCE-TW 2022, 2022, : 267 - 268
[49] Performance comparison of sparsifying basis functions for compressive speech enhancement
Sahu, Smriti
Rayavarapu, Neela
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 769 - 783
[50] Performance of nonlinear speech enhancement using phase space reconstruction
Johnson, MT
Lindgren, AC
Povinelli, RJ
Yuan, XL
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 920 - 923

← 1 2 3 4 5 →