LAYER-WISE ANALYSIS OF A SELF-SUPERVISED SPEECH REPRESENTATION MODEL

被引:80
|
作者
Pasad, Ankita [1 ]
Chou, Ju-Chieh [1 ]
Livescu, Karen [1 ]
机构
[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
关键词
Self-supervised pre-training; representation analysis; speech representation learning;
D O I
10.1109/ASRU51503.2021.9688093
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the type or extent of information encoded in the pre-trained representations themselves. Developing such insights can help understand the capabilities and limits of these models and enable the research community to more efficiently develop their usage for downstream applications. In this work, we begin to fill this gap by examining one recent and successful pre-trained model (wav2vec 2.0), via its intermediate representation vectors, using a suite of analysis tools. We use the metrics of canonical correlation, mutual information, and performance on simple downstream tasks with non-parametric probes, in order to (i) query for acoustic and linguistic information content, (ii) characterize the evolution of information across model layers, and (iii) understand how fine-tuning the model for automatic speech recognition (ASR) affects these observations. Our findings motivate modifying the fine-tuning protocol for ASR, which produces improved word error rates in a low-resource setting.
引用
收藏
页码:914 / 921
页数:8
相关论文
共 50 条
  • [41] DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
    Liu, Alexander H.
    Chang, Heng-Jui
    Auli, Michael
    Hsu, Wei-Ning
    Glass, James
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] IMPROVING SELF-SUPERVISED LEARNING FOR SPEECH RECOGNITION WITH INTERMEDIATE LAYER SUPERVISION
    Wang, Chengyi
    Wu, Yu
    Chen, Sanyuan
    Liu, Shujie
    Li, Jinyu
    Qian, Yao
    Yang, Zhenglu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7092 - 7096
  • [43] THE EFFECT OF SPOKEN LANGUAGE ON SPEECH ENHANCEMENT USING SELF-SUPERVISED SPEECH REPRESENTATION LOSS FUNCTIONS
    Close, George
    Hain, Thomas
    Goetze, Stefan
    2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
  • [44] End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
    Chang, Xuankai
    Maekaku, Takashi
    Fujita, Yuya
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 3819 - 3823
  • [45] EXPLORATION OF A SELF-SUPERVISED SPEECH MODEL: A STUDY ON EMOTIONAL CORPORA
    Li, Yuanchao
    Mohamied, Yumnah
    Bell, Peter
    Lai, Catherine
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 868 - 875
  • [46] Emotion-Aware Speech Self-Supervised Representation Learning with Intensity Knowledge
    Liu, Rui
    Ma, Zening
    INTERSPEECH 2024, 2024, : 3180 - 3184
  • [47] Research on Mongolian Speech Recognition Based on the Self-supervised Model
    Su, Hongyi
    Xue, Yu
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 199 - 203
  • [48] LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
    Evain, Solene
    Ha Nguyen
    Hang Le
    Boito, Marcely Zanon
    Mdhaffar, Salima
    Alisamir, Sina
    Tong, Ziyi
    Tomashenko, Natalia
    Dinarelli, Marco
    Parcollet, Titouan
    Allauzen, Alexandre
    Esteve, Yannick
    Lecouteux, Benjamin
    Portet, Francois
    Rossato, Solange
    Ringeval, Fabien
    Schwab, Didier
    Besacier, Laurent
    INTERSPEECH 2021, 2021, : 1439 - 1443
  • [49] HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
    Hsu, Wei-Ning
    Bolte, Benjamin
    Tsai, Yao-Hung Hubert
    Lakhotia, Kushal
    Salakhutdinov, Ruslan
    Mohamed, Abdelrahman
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3451 - 3460
  • [50] Self-supervised Context-aware Style Representation for Expressive Speech Synthesis
    Wu, Yihan
    Wang, Xi
    Zhang, Shaofei
    He, Lei
    Song, Ruihua
    Nie, Jian-Yun
    INTERSPEECH 2022, 2022, : 5503 - 5507