LAYER-WISE ANALYSIS OF A SELF-SUPERVISED SPEECH REPRESENTATION MODEL

被引:80
|
作者
Pasad, Ankita [1 ]
Chou, Ju-Chieh [1 ]
Livescu, Karen [1 ]
机构
[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
来源
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年
关键词
Self-supervised pre-training; representation analysis; speech representation learning;
D O I
10.1109/ASRU51503.2021.9688093
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the type or extent of information encoded in the pre-trained representations themselves. Developing such insights can help understand the capabilities and limits of these models and enable the research community to more efficiently develop their usage for downstream applications. In this work, we begin to fill this gap by examining one recent and successful pre-trained model (wav2vec 2.0), via its intermediate representation vectors, using a suite of analysis tools. We use the metrics of canonical correlation, mutual information, and performance on simple downstream tasks with non-parametric probes, in order to (i) query for acoustic and linguistic information content, (ii) characterize the evolution of information across model layers, and (iii) understand how fine-tuning the model for automatic speech recognition (ASR) affects these observations. Our findings motivate modifying the fine-tuning protocol for ASR, which produces improved word error rates in a low-resource setting.
引用
收藏
页码:914 / 921
页数:8
相关论文
共 50 条
  • [21] Chemistry-Wise Augmentations for Molecule Graph Self-supervised Representation Learning
    Ondar, Evgeniia
    Makarov, Ilya
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2023, PT II, 2023, 14135 : 327 - 336
  • [22] ZERO-SHOT TEXT-TO-SPEECH SYNTHESIS CONDITIONED USING SELF-SUPERVISED SPEECH REPRESENTATION MODEL
    Fujita, Kenichi
    Ashihara, Takanori
    Kanagawa, Hiroki
    Moriya, Takafumi
    Ijima, Yusuke
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [23] Efficiency-oriented approaches for self-supervised speech representation learning
    Lugo, Luis
    Vielzeuf, Valentin
    International Journal of Speech Technology, 2024, 27 (03) : 765 - 779
  • [24] Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models
    Ashihara, Takanori
    Moriya, Takafumi
    Matsuura, Kohei
    Tanaka, Tomohiro
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2023,
  • [25] Clustering and Retraining Based Self-Supervised Speech Representation Learning Method
    Zhang, Wenlin
    Liu, Xuepeng
    Niu, Tong
    Yang, Xukui
    Qu, Dan
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (05): : 461 - 471
  • [26] Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation
    Luo, Jian
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    INTERSPEECH 2021, 2021, : 1169 - 1173
  • [27] Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning
    Kim, Eesung
    Jeon, Jae-Jin
    Seo, Hyeji
    Kim, Hoon
    INTERSPEECH 2022, 2022, : 1411 - 1415
  • [28] EXPLORING THE INTEGRATION OF SPEECH SEPARATION AND RECOGNITION WITH SELF-SUPERVISED LEARNING REPRESENTATION
    Masuyama, Yoshiki
    Chang, Xuankai
    Zhang, Wangyou
    Cornell, Samuele
    Wang, Zhong-Qiu
    Ono, Nobutaka
    Qian, Yanmin
    Watanabe, Shinji
    2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
  • [29] Self-supervised Representation Fusion for Speech and Wearable Based Emotion Recognition
    Dissanayake, Vipula
    Seneviratne, Sachith
    Suriyaarachchi, Hussel
    Wen, Elliott
    Nanayakkara, Suranga
    INTERSPEECH 2022, 2022, : 3598 - 3602
  • [30] Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
    Sato, Hiroshi
    Masumura, Ryo
    Ochiai, Tsubasa
    Delcroix, Marc
    Moriya, Takafumi
    Ashihara, Takanori
    Shinayama, Kentaro
    Mizuno, Saki
    Ihori, Mana
    Tanaka, Tomohiro
    Hojo, Nobukatsu
    INTERSPEECH 2023, 2023, : 854 - 858