LAYER-WISE ANALYSIS OF A SELF-SUPERVISED SPEECH REPRESENTATION MODEL

被引：80

作者：

Pasad, Ankita ^{[1
]}

Chou, Ju-Chieh ^{[1
]}

Livescu, Karen ^{[1
]}

机构：

[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA

来源：

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年

关键词：

Self-supervised pre-training; representation analysis; speech representation learning;

D O I：

10.1109/ASRU51503.2021.9688093

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the type or extent of information encoded in the pre-trained representations themselves. Developing such insights can help understand the capabilities and limits of these models and enable the research community to more efficiently develop their usage for downstream applications. In this work, we begin to fill this gap by examining one recent and successful pre-trained model (wav2vec 2.0), via its intermediate representation vectors, using a suite of analysis tools. We use the metrics of canonical correlation, mutual information, and performance on simple downstream tasks with non-parametric probes, in order to (i) query for acoustic and linguistic information content, (ii) characterize the evolution of information across model layers, and (iii) understand how fine-tuning the model for automatic speech recognition (ASR) affects these observations. Our findings motivate modifying the fine-tuning protocol for ASR, which produces improved word error rates in a low-resource setting.

引用

页码：914 / 921

页数：8

共 50 条

[41] DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Liu, Alexander H.
Chang, Heng-Jui
Auli, Michael
Hsu, Wei-Ning
Glass, James
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[42] IMPROVING SELF-SUPERVISED LEARNING FOR SPEECH RECOGNITION WITH INTERMEDIATE LAYER SUPERVISION
Wang, Chengyi
Wu, Yu
Chen, Sanyuan
Liu, Shujie
Li, Jinyu
Qian, Yao
Yang, Zhenglu
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7092 - 7096
[43] THE EFFECT OF SPOKEN LANGUAGE ON SPEECH ENHANCEMENT USING SELF-SUPERVISED SPEECH REPRESENTATION LOSS FUNCTIONS
Close, George
Hain, Thomas
Goetze, Stefan
2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
[44] End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
Chang, Xuankai
Maekaku, Takashi
Fujita, Yuya
Watanabe, Shinji
INTERSPEECH 2022, 2022, : 3819 - 3823
[45] EXPLORATION OF A SELF-SUPERVISED SPEECH MODEL: A STUDY ON EMOTIONAL CORPORA
Li, Yuanchao
Mohamied, Yumnah
Bell, Peter
Lai, Catherine
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 868 - 875
[46] Emotion-Aware Speech Self-Supervised Representation Learning with Intensity Knowledge
Liu, Rui
Ma, Zening
INTERSPEECH 2024, 2024, : 3180 - 3184
[47] Research on Mongolian Speech Recognition Based on the Self-supervised Model
Su, Hongyi
Xue, Yu
2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 199 - 203
[48] LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
Evain, Solene
Ha Nguyen
Hang Le
Boito, Marcely Zanon
Mdhaffar, Salima
Alisamir, Sina
Tong, Ziyi
Tomashenko, Natalia
Dinarelli, Marco
Parcollet, Titouan
Allauzen, Alexandre
Esteve, Yannick
Lecouteux, Benjamin
Portet, Francois
Rossato, Solange
Ringeval, Fabien
Schwab, Didier
Besacier, Laurent
INTERSPEECH 2021, 2021, : 1439 - 1443
[49] HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Hsu, Wei-Ning
Bolte, Benjamin
Tsai, Yao-Hung Hubert
Lakhotia, Kushal
Salakhutdinov, Ruslan
Mohamed, Abdelrahman
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3451 - 3460
[50] Self-supervised Context-aware Style Representation for Expressive Speech Synthesis
Wu, Yihan
Wang, Xi
Zhang, Shaofei
He, Lei
Song, Ruihua
Nie, Jian-Yun
INTERSPEECH 2022, 2022, : 5503 - 5507

← 1 2 3 4 5 →