Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

被引:0
|
作者
Zhang, Wangyou [1 ,2 ]
Saijo, Kohei [3 ]
Jung, Jee-weon [2 ]
Li, Chenda [1 ,2 ]
Watanabe, Shinji [2 ]
Qiani, Yanmin [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Waseda Univ, Tokyo, Japan
来源
基金
美国国家科学基金会;
关键词
speech enhancement; scalability; robustness; generalizability;
D O I
10.21437/Interspeech.2024-1266
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning-based speech enhancement (SE) models have achieved impressive performance in the past decade. Numerous advanced architectures have been designed to deliver state-of-the-art performance; however, their scalability potential remains unrevealed. Meanwhile, the majority of research focuses on small-sized datasets with restricted diversity, leading to a plateau in performance improvement. In this paper, we aim to provide new insights for addressing the above issues by exploring the scalability of SE models in terms of architectures, model sizes, compute budgets, and dataset sizes. Our investigation involves several popular SE architectures and speech data from different domains. Experiments reveal both similarities and distinctions between the scaling effects in SE and other tasks such as speech recognition. These findings further provide insights into the under-explored SE directions, e.g., larger-scale multi-domain corpora and efficiently scalable architectures.
引用
收藏
页码:1740 / 1744
页数:5
相关论文
共 50 条
  • [21] Performance Analysis of Speech Enhancement Algorithm for Robust Speech Recognition System
    Babu, C. Ganesh
    Vanathi, P. T.
    Ramachandran, R.
    Rajaa, M. Senthil
    RECENT ADVANCES IN NETWORKING, VLSI AND SIGNAL PROCESSING, 2010, : 197 - +
  • [22] FOXP2 alterations: A comprehensive look beyond speech and language
    Utermann-Thuesing, Caroline
    Caliebe, Almuth
    Hellenbroich, Yorck
    Spielmann, Malte
    Nagel, Inga
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 1490 - 1490
  • [23] Scalability of GaN Nanowire FET beyond 5 nm: A Simulation Study
    Rajiv Ranjan Thakur
    Nidhi Chaturvedi
    Journal of Electronic Materials, 2021, 50 : 4128 - 4134
  • [24] Scalability of GaN Nanowire FET beyond 5 nm: A Simulation Study
    Thakur, Rajiv Ranjan
    Chaturvedi, Nidhi
    JOURNAL OF ELECTRONIC MATERIALS, 2021, 50 (07) : 4128 - 4134
  • [25] Speech Intelligibility and Quality: A Comparative Study of Speech Enhancement Algorithms
    Xu, Xiaodong
    Flynn, Ronan
    Russell, Michael
    2017 28TH IRISH SIGNALS AND SYSTEMS CONFERENCE (ISSC), 2017,
  • [26] Comparative Performance Study Between Spectral Subtraction and Discreet Wavelet Transform for Speech Enhancement
    Boutaleb, R.
    Meraoubi, H.
    Ykhlef, F.
    Benzaba, W.
    Boucetta, Y.
    Bendaouia, L.
    2013 ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2013,
  • [27] A Comparative Study on Speech Enhancement methods - Performance Evaluation of SNR for Hearing aid Listeners
    Lakshmi, Vanitha M.
    Sudha, S.
    Aswini, A.
    2016 IEEE INTERNATIONAL CONFERENCE ON TECHNOLOGICAL INNOVATIONS IN ICT FOR AGRICULTURE AND RURAL DEVELOPMENT (TIAR), 2016, : 81 - 85
  • [28] Beyond intelligibility - The performance of text-to-speech synthesisers
    Johnston, RD
    BT TECHNOLOGY JOURNAL, 1996, 14 (01): : 100 - 111
  • [29] Towards Comprehensive Subgroup Performance Analysis in Speech Models
    Koudounas, Alkis
    Pastor, Eliana
    Attanasio, Giuseppe
    Mazzia, Vittorio
    Giollo, Manuel
    Gueudre, Thomas
    Reale, Elisa
    Cagliero, Luca
    Cumani, Sandro
    de Alfaro, Luca
    Baralis, Elena
    Amberti, Daniele
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1468 - 1480
  • [30] Beyond intelligibility - the performance of text-to-speech synthesisers
    Johnston, R.D.
    British Telecom technology journal, 1996, 14 (01): : 100 - 111