Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement

被引:0
|
作者
Yang, Hejung [1 ]
Kang, Hong-Goo [1 ]
机构
[1] Yonsei Univ, Dept Elect & Elect Engn, Seoul, South Korea
来源
关键词
speech enhancement; self-supervised model; feature normalization; REPRESENTATION;
D O I
10.21437/Interspeech.2023-623
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data. As such, they have been frequently used as base networks for various pattern classification tasks such as speech recognition. However, not much research has been conducted on applying these types of models to the field of speech signal generation. In this paper, we investigate the feasibility of using pre-trained speech representation models for a downstream speech enhancement task. To alleviate mismatches between the input features of the pre-trained model and the target enhancement model, we adopt a novel feature normalization technique to smoothly link these modules together. Our proposed method enables significant improvements in speech quality compared to baselines when combined with various types of pre-trained speech models.
引用
收藏
页码:814 / 818
页数:5
相关论文
共 50 条
  • [21] Boosting Self-Supervised Embeddings for Speech Enhancement
    Hung, Kuo-Hsuan
    Fu, Szu-Wei
    Tseng, Huan-Hsin
    Chiang, Hsin-Tien
    Tsao, Yu
    Lin, Chii-Wann
    INTERSPEECH 2022, 2022, : 186 - 190
  • [22] Scaling Effect of Self-Supervised Speech Models
    Pu, Jie
    Yang, Yuguang
    Li, Ruirui
    Elibol, Oguz
    Droppo, Jasha
    INTERSPEECH 2021, 2021, : 1084 - 1088
  • [23] ON COMPRESSING SEQUENCES FOR SELF-SUPERVISED SPEECH MODELS
    Meng, Yen
    Chen, Hsuan-Jui
    Shi, Jiatong
    Watanabe, Shinji
    Garcia, Paola
    Lee, Hung-yi
    Tang, Hao
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1128 - 1135
  • [24] Self-Supervised Domain Adaptive Segmentation of Breast Cancer via Test-Time Fine-Tuning
    Lee, Kyungsu
    Lee, Haeyun
    El Fakhri, Georges
    Woo, Jonghye
    Hwang, Jae Youn
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT I, 2023, 14220 : 539 - 550
  • [25] Self-supervised Contrastive BERT Fine-tuning for Fusion-Based Reviewed-Item Retrieval
    Pour, Mohammad Mahdi Abdollah
    Farinneya, Parsa
    Toroghi, Armin
    Korikov, Anton
    Pesaranghader, Ali
    Sajed, Touqir
    Bharadwaj, Manasa
    Mavrin, Borislav
    Sanner, Scott
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT I, 2023, 13980 : 3 - 17
  • [26] INVESTIGATING SELF-SUPERVISED LEARNING FOR SPEECH ENHANCEMENT AND SEPARATION
    Huang, Zili
    Watanabe, Shinji
    Yang, Shu-wen
    Garcia, Paola
    Khudanpur, Sanjeev
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6837 - 6841
  • [27] Fine-tuning large language models for rare disease concept normalization
    Wang, Andy
    Liu, Cong
    Yang, Jingye
    Weng, Chunhua
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 2076 - 2083
  • [28] Predicting hemodynamic parameters based on arterial blood pressure waveform using self-supervised learning and fine-tuning
    Liao, Ke
    Elibol, Armagan
    Gao, Ziyan
    Meng, Lingzhong
    Chong, Nak Young
    APPLIED INTELLIGENCE, 2025, 55 (06)
  • [29] ProsAudit, a prosodic benchmark for self-supervised speech models
    de Seyssel, Maureen
    Lavechin, Marvin
    Titeux, Hadrien
    Thomas, Arthur
    Virlet, Gwendal
    Revilla, Andrea Santos
    Wisniewski, Guillaume
    Ludusan, Bogdan
    Dupoux, Emmanuel
    INTERSPEECH 2023, 2023, : 2963 - 2967
  • [30] PHONEME SEGMENTATION USING SELF-SUPERVISED SPEECH MODELS
    Strgar, Luke
    Harwath, David
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1067 - 1073