Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement

被引:0
|
作者
Yang, Hejung [1 ]
Kang, Hong-Goo [1 ]
机构
[1] Yonsei Univ, Dept Elect & Elect Engn, Seoul, South Korea
来源
关键词
speech enhancement; self-supervised model; feature normalization; REPRESENTATION;
D O I
10.21437/Interspeech.2023-623
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data. As such, they have been frequently used as base networks for various pattern classification tasks such as speech recognition. However, not much research has been conducted on applying these types of models to the field of speech signal generation. In this paper, we investigate the feasibility of using pre-trained speech representation models for a downstream speech enhancement task. To alleviate mismatches between the input features of the pre-trained model and the target enhancement model, we adopt a novel feature normalization technique to smoothly link these modules together. Our proposed method enables significant improvements in speech quality compared to baselines when combined with various types of pre-trained speech models.
引用
收藏
页码:814 / 818
页数:5
相关论文
共 50 条
  • [11] Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning
    Chen, Tianlong
    Liu, Sijia
    Chang, Shiyu
    Cheng, Yu
    Amini, Lisa
    Wang, Zhangyang
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 696 - 705
  • [12] EXPLORING EFFICIENT-TUNING METHODS IN SELF-SUPERVISED SPEECH MODELS
    Chen, Zih-Ching
    Fu, Chin-Lun
    Liu, Chih-Ying
    Li, Shang-Wen
    Lee, Hung-yi
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1120 - 1127
  • [13] Fine-Tuning Self-Supervised Multilingual Sequence-To-Sequence Models for Extremely Low-Resource NMT
    Thillainathan, Sarubi
    Ranathunga, Surangika
    Jayasena, Sanath
    MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON 2021) / 7TH INTERNATIONAL MULTIDISCIPLINARY ENGINEERING RESEARCH CONFERENCE, 2021, : 432 - 437
  • [14] ScoutWav: Two-Step Fine-Tuning on Self-Supervised Automatic Speech Recognition for Low-Resource Environments
    Fatehi, Kavan
    Torres, Mercedes Torres
    Kucukyilmaz, Ayse
    INTERSPEECH 2022, 2022, : 3523 - 3527
  • [15] Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering
    Chang, Heng-Jui
    Liu, Alexander H.
    Glass, James
    INTERSPEECH 2023, 2023, : 2983 - 2987
  • [16] Fine-Tuning for Bayer Demosaicking Through Periodic-Consistent Self-Supervised Learning
    Liu, Chang
    He, Songze
    Xu, Jiajun
    Li, Jia
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 989 - 993
  • [17] SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION
    Gat, Itai
    Aronowitz, Hagai
    Zhu, Weizhong
    Morais, Edmilson
    Hoory, Ron
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7342 - 7346
  • [18] Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition
    Siriwardhana, Shamane
    Reis, Andrew
    Weerasekera, Rivindu
    Nanayakkara, Suranga
    INTERSPEECH 2020, 2020, : 3755 - 3759
  • [19] Exploiting Fine-tuning of Self-supervised Learning Models for Improving Bi-modal Sentiment Analysis and Emotion Recognition
    Yang, Wei
    Fukayama, Satoru
    Heracleous, Panikos
    Ogata, Jun
    INTERSPEECH 2022, 2022, : 1998 - 2002
  • [20] Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised Learning with Targeted Fine-Tuning and Data Augmentation
    Mujtaba, Dena
    Mahapatra, Nihar R.
    Arne, Megan
    Yaruss, J. Scott
    Herring, Caryn
    Bin, Jia
    INTERSPEECH 2024, 2024, : 1275 - 1279