Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

被引:0
|
作者
Zhang, Wangyou [1 ,2 ]
Saijo, Kohei [3 ]
Jung, Jee-weon [2 ]
Li, Chenda [1 ,2 ]
Watanabe, Shinji [2 ]
Qiani, Yanmin [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Waseda Univ, Tokyo, Japan
来源
基金
美国国家科学基金会;
关键词
speech enhancement; scalability; robustness; generalizability;
D O I
10.21437/Interspeech.2024-1266
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning-based speech enhancement (SE) models have achieved impressive performance in the past decade. Numerous advanced architectures have been designed to deliver state-of-the-art performance; however, their scalability potential remains unrevealed. Meanwhile, the majority of research focuses on small-sized datasets with restricted diversity, leading to a plateau in performance improvement. In this paper, we aim to provide new insights for addressing the above issues by exploring the scalability of SE models in terms of architectures, model sizes, compute budgets, and dataset sizes. Our investigation involves several popular SE architectures and speech data from different domains. Experiments reveal both similarities and distinctions between the scaling effects in SE and other tasks such as speech recognition. These findings further provide insights into the under-explored SE directions, e.g., larger-scale multi-domain corpora and efficiently scalable architectures.
引用
收藏
页码:1740 / 1744
页数:5
相关论文
共 50 条
  • [31] Enhancement of the comprehensive performance of tetracycline adsorption by halloysite nanotubes: Kinetics, mechanism, and reusability study
    Bessaha, Gania
    Bessaha, Fatiha
    Mahrez, Nouria
    Boucif, Fatima
    Coruh, Ali
    Khelifa, Amine
    DESALINATION AND WATER TREATMENT, 2024, 320
  • [32] A cathode interface engineering approach for the comprehensive study of indoor performance enhancement in organic photovoltaics
    Torimtubun, Alfonsina Abat Amelenan
    Sanchez, Jose G.
    Pallares, Josep
    Marsal, Lluis F.
    SUSTAINABLE ENERGY & FUELS, 2020, 4 (07): : 3378 - 3387
  • [33] An Empirical Study of VMM Overhead, Configuration Performance, and Scalability
    Huang, Xiaofei
    Bai, Xiaoying
    Lee, Richard M.
    2013 IEEE SEVENTH INTERNATIONAL SYMPOSIUM ON SERVICE-ORIENTED SYSTEM ENGINEERING (SOSE 2013), 2013, : 359 - 366
  • [34] Aligning speech enhancement for improving downstream classification performance
    Xiong, Yan
    Berisha, Visar
    Chakrabarti, Chaitali
    INTERSPEECH 2023, 2023, : 3874 - 3878
  • [35] Performance Evaluation of Front End Speech Enhancement Techniques
    Bhowmick, Anirban
    Chandra, Mahesh
    Biswas, Astik
    Sahu, P. K.
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 885 - 889
  • [36] Performance Evaluation of a Speech Enhancement Technique Using Wavelets
    Dhivya, R.
    Justin, Judith
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SOFT COMPUTING SYSTEMS, ICSCS 2015, VOL 1, 2016, 397 : 637 - 646
  • [37] Comprehensive Analysis of Performance, Fault-tolerance and Scalability in Grid Resource Management System
    Kong, Xiangzhen
    Huang, Jiwei
    Lin, Chuang
    2009 EIGHTH INTERNATIONAL CONFERENCE ON GRID AND COOPERATIVE COMPUTING, PROCEEDINGS, 2009, : 83 - 90
  • [38] A study on IMM with NPHMM and an application to speech enhancement
    Lee, KY
    Lee, J
    SIGNAL PROCESSING, 2004, 84 (09) : 1701 - 1707
  • [39] A Study on Speech Enhancement for In-Ear-Microphone
    Jung, Chan-Joong
    Chung, Weon-Gook
    Bae, Myung-Jin
    COMPUTER APPLICATIONS FOR GRAPHICS, GRID COMPUTING, AND INDUSTRIAL ENVIRONMENT, 2012, 351 : 182 - 188
  • [40] A comparative intelligibility study of speech enhancement algorithms
    Hu, Yi
    Loizou, Philipos C.
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 561 - +