LAYER-WISE ANALYSIS OF A SELF-SUPERVISED SPEECH REPRESENTATION MODEL

被引：80

作者：

Pasad, Ankita ^{[1
]}

Chou, Ju-Chieh ^{[1
]}

Livescu, Karen ^{[1
]}

机构：

[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA

来源：

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年

关键词：

Self-supervised pre-training; representation analysis; speech representation learning;

D O I：

10.1109/ASRU51503.2021.9688093

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the type or extent of information encoded in the pre-trained representations themselves. Developing such insights can help understand the capabilities and limits of these models and enable the research community to more efficiently develop their usage for downstream applications. In this work, we begin to fill this gap by examining one recent and successful pre-trained model (wav2vec 2.0), via its intermediate representation vectors, using a suite of analysis tools. We use the metrics of canonical correlation, mutual information, and performance on simple downstream tasks with non-parametric probes, in order to (i) query for acoustic and linguistic information content, (ii) characterize the evolution of information across model layers, and (iii) understand how fine-tuning the model for automatic speech recognition (ASR) affects these observations. Our findings motivate modifying the fine-tuning protocol for ASR, which produces improved word error rates in a low-resource setting.

引用

页码：914 / 921

页数：8

共 50 条

[1] LAYER-WISE ANALYSIS OF SELF-SUPERVISED ACOUSTIC WORD EMBEDDINGS: A STUDY ON SPEECH EMOTION RECOGNITION
Saliba, Alexandra
Li, Yuanchao
Sanabria, Ramon
Lai, Catherine
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 590 - 594
[2] Incremental Layer-Wise Self-Supervised Learning for Efficient Unsupervised Speech Domain Adaptation On Device
Huo, Zhouyuan
Hwang, Dongseong
Sim, Khe Chai
Garg, Shefali
Misra, Ananya
Siddhartha, Nikhil
Strohman, Trevor
Beaufays, Francoise
INTERSPEECH 2022, 2022, : 4845 - 4849
[3] L-DAWA: Layer-wise Divergence Aware Weight Aggregation in Federated Self-Supervised Visual Representation Learning
Rehman, Yasar Abbas Ur
Gao, Yan
de Gusmao, Pedro Porto Buarque
Alibeigi, Mina
Shen, Jiajun
Lane, Nicholas D.
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16418 - 16427
[4] Investigation of Layer-Wise Speech Representations in Self-Supervised Learning Models: A Cross-Lingual Study in Detecting Depression
Maji, Bubai
Guha, Rajlakshmi
Routray, Aurobinda
Nasreen, Shazia
Majumdar, Debabrata
INTERSPEECH 2024, 2024, : 3020 - 3024
[5] A layer-wise fusion network incorporating self-supervised learning for multimodal MR image synthesis
Zhou, Qian
Zou, Hua
FRONTIERS IN GENETICS, 2022, 13
[6] Self-Supervised Speech Representation Learning: A Review
Mohamed, Abdelrahman
Lee, Hung-yi
Borgholt, Lasse
Havtorn, Jakob D.
Edin, Joakim
Igel, Christian
Kirchhoff, Katrin
Li, Shang-Wen
Livescu, Karen
Maaloe, Lars
Sainath, Tara N.
Watanabe, Shinji
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1179 - 1210
[7] Unveiling the Linguistic Capabilities of a Self-Supervised Speech Model Through Cross-Lingual Benchmark and Layer- Wise Similarity Analysis
Ashihara, Takanori
Delcroix, Marc
Ijima, Yusuke
Kashino, Makio
IEEE ACCESS, 2024, 12 : 98835 - 98855
[8] Self-Supervised Learning With Segmental Masking for Speech Representation
Yue, Xianghu
Lin, Jingru
Gutierrez, Fabian Ritter
Li, Haizhou
IEEE Journal on Selected Topics in Signal Processing, 2022, 16 (06): : 1367 - 1379
[9] Self-Supervised Learning With Segmental Masking for Speech Representation
Yue, Xianghu
Lin, Jingru
Gutierrez, Fabian Ritter
Li, Haizhou
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1367 - 1379
[10] Phonetically Motivated Self-Supervised Speech Representation Learning
Yue, Xianghu
Li, Haizhou
INTERSPEECH 2021, 2021, : 746 - 750

← 1 2 3 4 5 →