Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement

被引:0
|
作者
Yang, Hejung [1 ]
Kang, Hong-Goo [1 ]
机构
[1] Yonsei Univ, Dept Elect & Elect Engn, Seoul, South Korea
来源
关键词
speech enhancement; self-supervised model; feature normalization; REPRESENTATION;
D O I
10.21437/Interspeech.2023-623
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large, pre-trained representation models trained using self-supervised learning have gained popularity in various fields of machine learning because they are able to extract high-quality salient features from input data. As such, they have been frequently used as base networks for various pattern classification tasks such as speech recognition. However, not much research has been conducted on applying these types of models to the field of speech signal generation. In this paper, we investigate the feasibility of using pre-trained speech representation models for a downstream speech enhancement task. To alleviate mismatches between the input features of the pre-trained model and the target enhancement model, we adopt a novel feature normalization technique to smoothly link these modules together. Our proposed method enables significant improvements in speech quality compared to baselines when combined with various types of pre-trained speech models.
引用
收藏
页码:814 / 818
页数:5
相关论文
共 50 条
  • [41] Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models
    Fan, Ruchao
    Shankar, Natarajan Balaji
    Alwani, Abeer
    INTERSPEECH 2024, 2024, : 5173 - 5177
  • [42] Word Discovery in Visually Grounded, Self-Supervised Speech Models
    Peng, Puyuan
    Harwath, David
    INTERSPEECH 2022, 2022, : 2823 - 2827
  • [43] Word Discovery in Visually Grounded, Self-Supervised Speech Models
    Department of Computer Science, The University of Texas, Austin, United States
    Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH, (2823-2827):
  • [44] Membership Inference Attacks Against Self-supervised Speech Models
    Tseng, Wei-Cheng
    Kao, Wei-Tsung
    Lee, Hung-yi
    INTERSPEECH 2022, 2022, : 5040 - 5044
  • [45] On the (In)Efficiency of Acoustic Feature Extractors for Self-Supervised Speech Representation Learning
    Parcollet, Titouan
    Zhang, Shucong
    Ramos, Alberto Gil C. P.
    van Dalen, Rogier
    Bhattacharya, Sourav
    INTERSPEECH 2023, 2023, : 581 - 585
  • [46] Investigation of Ensemble of Self-Supervised Models for Speech Emotion Recognition
    Wu, Yanfeng
    Yue, Pengcheng
    Cheng, Cuiping
    Li, Taihao
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 988 - 995
  • [47] DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models
    Peng, Yifan
    Sudo, Yui
    Muhammad, Shakeel
    Watanabe, Shinji
    INTERSPEECH 2023, 2023, : 62 - 66
  • [48] North Sami Dialect Identification with Self-supervised Speech Models
    Kakouros, Sofoklis
    Hiovain-Asikainen, Katri
    INTERSPEECH 2023, 2023, : 5306 - 5310
  • [49] EFFICIENT ADAPTER TRANSFER OF SELF-SUPERVISED SPEECH MODELS FOR AUTOMATIC SPEECH RECOGNITION
    Thomas, Bethan
    Kessler, Samuel
    Karout, Salah
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7102 - 7106
  • [50] Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
    Sato, Hiroshi
    Masumura, Ryo
    Ochiai, Tsubasa
    Delcroix, Marc
    Moriya, Takafumi
    Ashihara, Takanori
    Shinayama, Kentaro
    Mizuno, Saki
    Ihori, Mana
    Tanaka, Tomohiro
    Hojo, Nobukatsu
    INTERSPEECH 2023, 2023, : 854 - 858