LAYER-WISE ANALYSIS OF A SELF-SUPERVISED SPEECH REPRESENTATION MODEL

被引:80
|
作者
Pasad, Ankita [1 ]
Chou, Ju-Chieh [1 ]
Livescu, Karen [1 ]
机构
[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
关键词
Self-supervised pre-training; representation analysis; speech representation learning;
D O I
10.1109/ASRU51503.2021.9688093
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the type or extent of information encoded in the pre-trained representations themselves. Developing such insights can help understand the capabilities and limits of these models and enable the research community to more efficiently develop their usage for downstream applications. In this work, we begin to fill this gap by examining one recent and successful pre-trained model (wav2vec 2.0), via its intermediate representation vectors, using a suite of analysis tools. We use the metrics of canonical correlation, mutual information, and performance on simple downstream tasks with non-parametric probes, in order to (i) query for acoustic and linguistic information content, (ii) characterize the evolution of information across model layers, and (iii) understand how fine-tuning the model for automatic speech recognition (ASR) affects these observations. Our findings motivate modifying the fine-tuning protocol for ASR, which produces improved word error rates in a low-resource setting.
引用
收藏
页码:914 / 921
页数:8
相关论文
共 50 条
  • [1] LAYER-WISE ANALYSIS OF SELF-SUPERVISED ACOUSTIC WORD EMBEDDINGS: A STUDY ON SPEECH EMOTION RECOGNITION
    Saliba, Alexandra
    Li, Yuanchao
    Sanabria, Ramon
    Lai, Catherine
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 590 - 594
  • [2] Incremental Layer-Wise Self-Supervised Learning for Efficient Unsupervised Speech Domain Adaptation On Device
    Huo, Zhouyuan
    Hwang, Dongseong
    Sim, Khe Chai
    Garg, Shefali
    Misra, Ananya
    Siddhartha, Nikhil
    Strohman, Trevor
    Beaufays, Francoise
    INTERSPEECH 2022, 2022, : 4845 - 4849
  • [3] L-DAWA: Layer-wise Divergence Aware Weight Aggregation in Federated Self-Supervised Visual Representation Learning
    Rehman, Yasar Abbas Ur
    Gao, Yan
    de Gusmao, Pedro Porto Buarque
    Alibeigi, Mina
    Shen, Jiajun
    Lane, Nicholas D.
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16418 - 16427
  • [4] Investigation of Layer-Wise Speech Representations in Self-Supervised Learning Models: A Cross-Lingual Study in Detecting Depression
    Maji, Bubai
    Guha, Rajlakshmi
    Routray, Aurobinda
    Nasreen, Shazia
    Majumdar, Debabrata
    INTERSPEECH 2024, 2024, : 3020 - 3024
  • [5] A layer-wise fusion network incorporating self-supervised learning for multimodal MR image synthesis
    Zhou, Qian
    Zou, Hua
    FRONTIERS IN GENETICS, 2022, 13
  • [6] Self-Supervised Speech Representation Learning: A Review
    Mohamed, Abdelrahman
    Lee, Hung-yi
    Borgholt, Lasse
    Havtorn, Jakob D.
    Edin, Joakim
    Igel, Christian
    Kirchhoff, Katrin
    Li, Shang-Wen
    Livescu, Karen
    Maaloe, Lars
    Sainath, Tara N.
    Watanabe, Shinji
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1179 - 1210
  • [7] Unveiling the Linguistic Capabilities of a Self-Supervised Speech Model Through Cross-Lingual Benchmark and Layer- Wise Similarity Analysis
    Ashihara, Takanori
    Delcroix, Marc
    Ijima, Yusuke
    Kashino, Makio
    IEEE ACCESS, 2024, 12 : 98835 - 98855
  • [8] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    IEEE Journal on Selected Topics in Signal Processing, 2022, 16 (06): : 1367 - 1379
  • [9] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1367 - 1379
  • [10] Phonetically Motivated Self-Supervised Speech Representation Learning
    Yue, Xianghu
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 746 - 750