LAYER-WISE ANALYSIS OF A SELF-SUPERVISED SPEECH REPRESENTATION MODEL

被引：80

作者：

Pasad, Ankita ^{[1
]}

Chou, Ju-Chieh ^{[1
]}

Livescu, Karen ^{[1
]}

机构：

[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA

来源：

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年

关键词：

Self-supervised pre-training; representation analysis; speech representation learning;

D O I：

10.1109/ASRU51503.2021.9688093

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the type or extent of information encoded in the pre-trained representations themselves. Developing such insights can help understand the capabilities and limits of these models and enable the research community to more efficiently develop their usage for downstream applications. In this work, we begin to fill this gap by examining one recent and successful pre-trained model (wav2vec 2.0), via its intermediate representation vectors, using a suite of analysis tools. We use the metrics of canonical correlation, mutual information, and performance on simple downstream tasks with non-parametric probes, in order to (i) query for acoustic and linguistic information content, (ii) characterize the evolution of information across model layers, and (iii) understand how fine-tuning the model for automatic speech recognition (ASR) affects these observations. Our findings motivate modifying the fine-tuning protocol for ASR, which produces improved word error rates in a low-resource setting.

引用

页码：914 / 921

页数：8

共 50 条

[31] On the (In)Efficiency of Acoustic Feature Extractors for Self-Supervised Speech Representation Learning
Parcollet, Titouan
Zhang, Shucong
Ramos, Alberto Gil C. P.
van Dalen, Rogier
Bhattacharya, Sourav
INTERSPEECH 2023, 2023, : 581 - 585
[32] EXPLORATION OF LANGUAGE DEPENDENCY FOR JAPANESE SELF-SUPERVISED SPEECH REPRESENTATION MODELS
Ashihara, Takanori
Moriya, Takafumi
Matsuura, Kohei
Tanaka, Tomohiro
arXiv, 2023,
[33] Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction
Mu, Zhaoxi
Yang, Xinyu
Sun, Sining
Yang, Qing
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18815 - 18823
[34] Refining Self-Supervised Learnt Speech Representation using Brain Activations
Li, Hengyu
Mei, Kangdi
Liu, Zhaoci
Ai, Yang
Chen, Liping
Zhang, Jie
Ling, Zhenhua
INTERSPEECH 2024, 2024, : 1480 - 1484
[35] A Comparative Study of Self-Supervised Speech Representation Based Voice Conversion
Huang, Wen-Chin
Yang, Shu-Wen
Hayashi, Tomoki
Toda, Tomoki
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1308 - 1318
[36] SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model
Wang, Jianzong
Zhang, Xulong
Tang, Haobin
Sun, Aolan
Cheng, Ning
Xiao, Jing
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[37] Speech SimCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning
Jiang, Dongwei
Li, Wubo
Cao, Miao
Zou, Wei
Li, Xiangang
INTERSPEECH 2021, 2021, : 1544 - 1548
[38] Phonetic Analysis of Self-supervised Representations of English Speech
Wells, Dan
Tang, Hao
Richmond, Korin
INTERSPEECH 2022, 2022, : 3583 - 3587
[39] Self-Supervised Hypergraph Representation Learning for Sociological Analysis
Sun, Xiangguo
Cheng, Hong
Liu, Bo
Li, Jia
Chen, Hongyang
Xu, Guandong
Yin, Hongzhi
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 11860 - 11871
[40] ANALYSIS OF LAYER-WISE TRAINING IN DIRECT SPEECH TO SPEECH TRANSLATION USING BI-LSTM
Arya, Lalaram
Agarwal, Ayush
Mishra, Jagabandhu
Prasanna, S. R. Mahadeva
2022 25TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA 2022), 2022,

← 1 2 3 4 5 →