Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement

被引:0
|
作者
Yang, Hejung [1 ]
Kang, Hong-Goo [1 ]
机构
[1] Yonsei Univ, Dept Elect & Elect Engn, Seoul, South Korea
来源
关键词
speech enhancement; self-supervised model; feature normalization; REPRESENTATION;
D O I
10.21437/Interspeech.2023-623
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data. As such, they have been frequently used as base networks for various pattern classification tasks such as speech recognition. However, not much research has been conducted on applying these types of models to the field of speech signal generation. In this paper, we investigate the feasibility of using pre-trained speech representation models for a downstream speech enhancement task. To alleviate mismatches between the input features of the pre-trained model and the target enhancement model, we adopt a novel feature normalization technique to smoothly link these modules together. Our proposed method enables significant improvements in speech quality compared to baselines when combined with various types of pre-trained speech models.
引用
收藏
页码:814 / 818
页数:5
相关论文
共 50 条
  • [31] On Combining Global and Localized Self-Supervised Models of Speech
    Dumpala, Sri Harsha
    Sastry, Chandramouli S.
    Uher, Rudolf
    Oore, Sageev
    INTERSPEECH 2022, 2022, : 3593 - 3597
  • [32] The Efficacy of Self-Supervised Speech Models as Audio Representations
    Wu, Tung-Yu
    Hsu, Tsu-Yuan
    Li, Chen-An
    Lin, Tzu-Han
    Lee, Hung-yi
    HEAR: HOLISTIC EVALUATION OF AUDIO REPRESENTATIONS, VOL 166, 2021, 166 : 90 - 110
  • [33] Cascaded encoders for fine-tuning ASR models on overlapped speech
    Rose, Richard
    Chang, Oscar
    Siohan, Olivier
    INTERSPEECH 2023, 2023, : 3457 - 3461
  • [34] One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification
    Heo, Jungwoo
    Lim, Chan-yeong
    Kim, Ju-ho
    Shin, Hyun-seo
    Yu, Ha-Jin
    INTERSPEECH 2023, 2023, : 5271 - 5275
  • [35] Assessment of Self-Supervised Denoising Methods for Esophageal Speech Enhancement
    Amarjouf, Madiha
    Ibn Elhaj, El Hassan
    Chami, Mouhcine
    Ezzine, Kadria
    Di Martino, Joseph
    APPLIED SCIENCES-BASEL, 2024, 14 (15):
  • [36] SFT-SGAT: A semi-supervised fine-tuning self-supervised graph attention network for emotion recognition and consciousness detection
    Qiu, Lina
    Zhong, Liangquan
    Li, Jianping
    Feng, Weisen
    Zhou, Chengju
    Pan, Jiahui
    NEURAL NETWORKS, 2024, 180
  • [37] Efficient Personalized Speech Enhancement Through Self-Supervised Learning
    Sivaraman, Aswin
    Kim, Minje
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1342 - 1356
  • [38] Self-Supervised Feature Enhancement: Applying Internal Pretext Task to Supervised Learning
    Xie, Tianshu
    Yang, Yuhang
    Ding, Zilin
    Cheng, Xuan
    Wang, Xiaomin
    Gong, Haigang
    Liu, Ming
    IEEE ACCESS, 2023, 11 : 1708 - 1717
  • [39] Fine-Tuning Language Models For Semi-Supervised Text Mining
    Chen, Xinyu
    Beaver, Ian
    Freeman, Cynthia
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 3608 - 3617
  • [40] On Separate Normalization in Self-supervised Transformers
    Chen, Xiaohui
    Wang, Yinkai
    Du, Yuanqi
    Hassoun, Soha
    Liu, Li-Ping
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,