Refining Self-Supervised Learnt Speech Representation using Brain Activations

被引:0
|
作者
Li, Hengyu [1 ]
Mei, Kangdi [1 ]
Liu, Zhaoci [1 ]
Ai, Yang [1 ]
Chen, Liping [1 ]
Zhang, Jie [1 ]
Ling, Zhenhua [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei, Peoples R China
来源
INTERSPEECH 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Pre-trained speech model; wav2vec2.0; brain activation; SUPERB;
D O I
10.21437/Interspeech.2024-604
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It was shown in literature that speech representations extracted by self-supervised pre-trained models exhibit similarities with brain activations of human for speech perception and fine-tuning speech representation models on downstream tasks can further improve the similarity. However, it still remains unclear if this similarity can be used to optimize the pre-trained speech models. In this work, we therefore propose to use the brain activations recorded by fMRI to refine the often-used wav2vec2.0 model by aligning model representations toward human neural responses. Experimental results on SUPERB reveal that this operation is beneficial for several downstream tasks, e.g., speaker verification, automatic speech recognition, intent classification. One can then consider the proposed method as a new alternative to improve self-supervised speech models.
引用
收藏
页码:1480 / 1484
页数:5
相关论文
共 50 条
  • [1] Self-Supervised Speech Representation Learning: A Review
    Mohamed, Abdelrahman
    Lee, Hung-yi
    Borgholt, Lasse
    Havtorn, Jakob D.
    Edin, Joakim
    Igel, Christian
    Kirchhoff, Katrin
    Li, Shang-Wen
    Livescu, Karen
    Maaloe, Lars
    Sainath, Tara N.
    Watanabe, Shinji
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1179 - 1210
  • [2] Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning
    Kim, Eesung
    Jeon, Jae-Jin
    Seo, Hyeji
    Kim, Hoon
    INTERSPEECH 2022, 2022, : 1411 - 1415
  • [3] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    IEEE Journal on Selected Topics in Signal Processing, 2022, 16 (06): : 1367 - 1379
  • [4] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1367 - 1379
  • [5] Phonetically Motivated Self-Supervised Speech Representation Learning
    Yue, Xianghu
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 746 - 750
  • [6] THE EFFECT OF SPOKEN LANGUAGE ON SPEECH ENHANCEMENT USING SELF-SUPERVISED SPEECH REPRESENTATION LOSS FUNCTIONS
    Close, George
    Hain, Thomas
    Goetze, Stefan
    2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
  • [7] TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
    Liu, Andy T.
    Li, Shang-Wen
    Lee, Hung-yi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2351 - 2366
  • [8] Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?
    Zaiem, Salah
    Kemiche, Youcef
    Parcollet, Titouan
    Essid, Slim
    Ravanelli, Mirco
    INTERSPEECH 2023, 2023, : 2873 - 2877
  • [9] CONTENTVEC: An Improved Self-Supervised Speech Representation by Disentangling Speakers
    Qian, Kaizhi
    Zhang, Yang
    Gao, Heting
    Ni, Junrui
    Lai, Cheng-I Jeff
    Cox, David
    Hasegawa-Johnson, Mark
    Chang, Shiyu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
    Meng, Chutong
    Ao, Junyi
    Ko, Tom
    Wang, Mingxuan
    Li, Haizhou
    INTERSPEECH 2023, 2023, : 2978 - 2982