LAYER-WISE ANALYSIS OF A SELF-SUPERVISED SPEECH REPRESENTATION MODEL

被引:80
|
作者
Pasad, Ankita [1 ]
Chou, Ju-Chieh [1 ]
Livescu, Karen [1 ]
机构
[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
关键词
Self-supervised pre-training; representation analysis; speech representation learning;
D O I
10.1109/ASRU51503.2021.9688093
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the type or extent of information encoded in the pre-trained representations themselves. Developing such insights can help understand the capabilities and limits of these models and enable the research community to more efficiently develop their usage for downstream applications. In this work, we begin to fill this gap by examining one recent and successful pre-trained model (wav2vec 2.0), via its intermediate representation vectors, using a suite of analysis tools. We use the metrics of canonical correlation, mutual information, and performance on simple downstream tasks with non-parametric probes, in order to (i) query for acoustic and linguistic information content, (ii) characterize the evolution of information across model layers, and (iii) understand how fine-tuning the model for automatic speech recognition (ASR) affects these observations. Our findings motivate modifying the fine-tuning protocol for ASR, which produces improved word error rates in a low-resource setting.
引用
收藏
页码:914 / 921
页数:8
相关论文
共 50 条
  • [31] On the (In)Efficiency of Acoustic Feature Extractors for Self-Supervised Speech Representation Learning
    Parcollet, Titouan
    Zhang, Shucong
    Ramos, Alberto Gil C. P.
    van Dalen, Rogier
    Bhattacharya, Sourav
    INTERSPEECH 2023, 2023, : 581 - 585
  • [32] EXPLORATION OF LANGUAGE DEPENDENCY FOR JAPANESE SELF-SUPERVISED SPEECH REPRESENTATION MODELS
    Ashihara, Takanori
    Moriya, Takafumi
    Matsuura, Kohei
    Tanaka, Tomohiro
    arXiv, 2023,
  • [33] Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction
    Mu, Zhaoxi
    Yang, Xinyu
    Sun, Sining
    Yang, Qing
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18815 - 18823
  • [34] Refining Self-Supervised Learnt Speech Representation using Brain Activations
    Li, Hengyu
    Mei, Kangdi
    Liu, Zhaoci
    Ai, Yang
    Chen, Liping
    Zhang, Jie
    Ling, Zhenhua
    INTERSPEECH 2024, 2024, : 1480 - 1484
  • [35] A Comparative Study of Self-Supervised Speech Representation Based Voice Conversion
    Huang, Wen-Chin
    Yang, Shu-Wen
    Hayashi, Tomoki
    Toda, Tomoki
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1308 - 1318
  • [36] SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model
    Wang, Jianzong
    Zhang, Xulong
    Tang, Haobin
    Sun, Aolan
    Cheng, Ning
    Xiao, Jing
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [37] Speech SimCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning
    Jiang, Dongwei
    Li, Wubo
    Cao, Miao
    Zou, Wei
    Li, Xiangang
    INTERSPEECH 2021, 2021, : 1544 - 1548
  • [38] Phonetic Analysis of Self-supervised Representations of English Speech
    Wells, Dan
    Tang, Hao
    Richmond, Korin
    INTERSPEECH 2022, 2022, : 3583 - 3587
  • [39] Self-Supervised Hypergraph Representation Learning for Sociological Analysis
    Sun, Xiangguo
    Cheng, Hong
    Liu, Bo
    Li, Jia
    Chen, Hongyang
    Xu, Guandong
    Yin, Hongzhi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 11860 - 11871
  • [40] ANALYSIS OF LAYER-WISE TRAINING IN DIRECT SPEECH TO SPEECH TRANSLATION USING BI-LSTM
    Arya, Lalaram
    Agarwal, Ayush
    Mishra, Jagabandhu
    Prasanna, S. R. Mahadeva
    2022 25TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA 2022), 2022,