Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

被引：0

作者：

Zhang, Wangyou ^{[1
,2
]}

Saijo, Kohei ^{[3
]}

Jung, Jee-weon ^{[2
]}

Li, Chenda ^{[1
,2
]}

Watanabe, Shinji ^{[2
]}

Qiani, Yanmin ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[3] Waseda Univ, Tokyo, Japan

来源：

INTERSPEECH 2024 | 2024年

基金：

美国国家科学基金会;

关键词：

speech enhancement; scalability; robustness; generalizability;

D O I：

10.21437/Interspeech.2024-1266

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning-based speech enhancement (SE) models have achieved impressive performance in the past decade. Numerous advanced architectures have been designed to deliver state-of-the-art performance; however, their scalability potential remains unrevealed. Meanwhile, the majority of research focuses on small-sized datasets with restricted diversity, leading to a plateau in performance improvement. In this paper, we aim to provide new insights for addressing the above issues by exploring the scalability of SE models in terms of architectures, model sizes, compute budgets, and dataset sizes. Our investigation involves several popular SE architectures and speech data from different domains. Experiments reveal both similarities and distinctions between the scaling effects in SE and other tasks such as speech recognition. These findings further provide insights into the under-explored SE directions, e.g., larger-scale multi-domain corpora and efficiently scalable architectures.

引用

页码：1740 / 1744

页数：5

共 50 条

[21] Performance Analysis of Speech Enhancement Algorithm for Robust Speech Recognition System
Babu, C. Ganesh
Vanathi, P. T.
Ramachandran, R.
Rajaa, M. Senthil
RECENT ADVANCES IN NETWORKING, VLSI AND SIGNAL PROCESSING, 2010, : 197 - +
[22] FOXP2 alterations: A comprehensive look beyond speech and language
Utermann-Thuesing, Caroline
Caliebe, Almuth
Hellenbroich, Yorck
Spielmann, Malte
Nagel, Inga
EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 1490 - 1490
[23] Scalability of GaN Nanowire FET beyond 5 nm: A Simulation Study
Rajiv Ranjan Thakur
Nidhi Chaturvedi
Journal of Electronic Materials, 2021, 50 : 4128 - 4134
[24] Scalability of GaN Nanowire FET beyond 5 nm: A Simulation Study
Thakur, Rajiv Ranjan
Chaturvedi, Nidhi
JOURNAL OF ELECTRONIC MATERIALS, 2021, 50 (07) : 4128 - 4134
[25] Speech Intelligibility and Quality: A Comparative Study of Speech Enhancement Algorithms
Xu, Xiaodong
Flynn, Ronan
Russell, Michael
2017 28TH IRISH SIGNALS AND SYSTEMS CONFERENCE (ISSC), 2017,
[26] Comparative Performance Study Between Spectral Subtraction and Discreet Wavelet Transform for Speech Enhancement
Boutaleb, R.
Meraoubi, H.
Ykhlef, F.
Benzaba, W.
Boucetta, Y.
Bendaouia, L.
2013 ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2013,
[27] A Comparative Study on Speech Enhancement methods - Performance Evaluation of SNR for Hearing aid Listeners
Lakshmi, Vanitha M.
Sudha, S.
Aswini, A.
2016 IEEE INTERNATIONAL CONFERENCE ON TECHNOLOGICAL INNOVATIONS IN ICT FOR AGRICULTURE AND RURAL DEVELOPMENT (TIAR), 2016, : 81 - 85
[28] Beyond intelligibility - The performance of text-to-speech synthesisers
Johnston, RD
BT TECHNOLOGY JOURNAL, 1996, 14 (01): : 100 - 111
[29] Towards Comprehensive Subgroup Performance Analysis in Speech Models
Koudounas, Alkis
Pastor, Eliana
Attanasio, Giuseppe
Mazzia, Vittorio
Giollo, Manuel
Gueudre, Thomas
Reale, Elisa
Cagliero, Luca
Cumani, Sandro
de Alfaro, Luca
Baralis, Elena
Amberti, Daniele
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1468 - 1480
[30] Beyond intelligibility - the performance of text-to-speech synthesisers
Johnston, R.D.
British Telecom technology journal, 1996, 14 (01): : 100 - 111

← 1 2 3 4 5 →