Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

被引:0
|
作者
Zhang, Wangyou [1 ,2 ]
Saijo, Kohei [3 ]
Jung, Jee-weon [2 ]
Li, Chenda [1 ,2 ]
Watanabe, Shinji [2 ]
Qiani, Yanmin [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Waseda Univ, Tokyo, Japan
来源
基金
美国国家科学基金会;
关键词
speech enhancement; scalability; robustness; generalizability;
D O I
10.21437/Interspeech.2024-1266
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning-based speech enhancement (SE) models have achieved impressive performance in the past decade. Numerous advanced architectures have been designed to deliver state-of-the-art performance; however, their scalability potential remains unrevealed. Meanwhile, the majority of research focuses on small-sized datasets with restricted diversity, leading to a plateau in performance improvement. In this paper, we aim to provide new insights for addressing the above issues by exploring the scalability of SE models in terms of architectures, model sizes, compute budgets, and dataset sizes. Our investigation involves several popular SE architectures and speech data from different domains. Experiments reveal both similarities and distinctions between the scaling effects in SE and other tasks such as speech recognition. These findings further provide insights into the under-explored SE directions, e.g., larger-scale multi-domain corpora and efficiently scalable architectures.
引用
收藏
页码:1740 / 1744
页数:5
相关论文
共 50 条
  • [41] Performance Analysis of Various Single Channel Speech Enhancement Algorithms for Automatic Speech Recognition
    Song, Myung-Suk
    Lee, Chang-Heon
    Kang, Hong-Goo
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1451 - 1454
  • [42] Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance
    Ochiai, Tsubasa
    Iwamoto, Kazuma
    Delcroix, Marc
    Ikeshita, Rintaro
    Sato, Hiroshi
    Araki, Shoko
    Katagiri, Shigeru
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3589 - 3602
  • [43] Assessment of Correlation between Objective Measures and Speech Recognition Performance in the Evaluation of Speech Enhancement
    Ding, Pei
    Hao, Jie
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 179 - 182
  • [44] Web application performance assessment: A study of responsiveness, throughput, and scalability
    Alnuhait, Hend
    Alzyadat, Wael
    Althunibat, Ahmad
    Kahtan, Hasan
    Zaqaibeh, Belal
    Al-Khawaja, Haneen A.
    INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2024, 11 (09): : 214 - 226
  • [45] Image Enhancement in Spatial Domain: A Comprehensive Study
    Rahman, Shanto
    Rahman, Md Mostafijur
    Hussain, Khalid
    Khaled, Shah Mostafa
    Shoyaib, Mohammad
    2014 17TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2014, : 368 - 373
  • [46] A Microphone Array Beamformer for the Performance Enhancement of Speech Recognizer in Car
    Han, Chul-Hee
    Kang, Hong-Goo
    Hwang, Youngsoo
    Youn, Dae-Hee
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2005, 24 (07): : 423 - 430
  • [47] Performance optimizations on U-Net speech enhancement models
    Chee, Jerry
    Braun, Sebastian
    Gopal, Vishak
    Cutler, Ross
    2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2022,
  • [48] Improving the performance of DEMUCS in speech enhancement with the perceptual metric loss
    Wu, Zong-Tai
    Chen, Yon-Tong
    Hung, Jeih-weih
    2022 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN, IEEE ICCE-TW 2022, 2022, : 267 - 268
  • [49] Performance comparison of sparsifying basis functions for compressive speech enhancement
    Sahu, Smriti
    Rayavarapu, Neela
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 769 - 783
  • [50] Performance of nonlinear speech enhancement using phase space reconstruction
    Johnson, MT
    Lindgren, AC
    Povinelli, RJ
    Yuan, XL
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 920 - 923