DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

被引:0
|
作者
Liu, Alexander H. [1 ]
Chang, Heng-Jui [1 ]
Auli, Michael [2 ]
Hsu, Wei-Ning [2 ]
Glass, James [1 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
[2] Meta AI, New York, NY USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce self-distillation and online clustering for self-supervised speech representation learning (DinoSR) which combines masked language modeling, self-distillation, and online clustering. We show that these concepts complement each other and result in a strong representation learning model for speech. DinoSR first extracts contextualized embeddings from the input audio with a teacher network, then runs an online clustering system on the embeddings to yield a machine-discovered phone inventory, and finally uses the discretized tokens to guide a student network. We show that DinoSR surpasses previous state-of-the-art performance in several downstream tasks, and provide a detailed analysis of the model and the learned discrete units. Code available at https://github.com/Alexander-H- Liu/dinosr.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction
    Mu, Zhaoxi
    Yang, Xinyu
    Sun, Sining
    Yang, Qing
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18815 - 18823
  • [32] Few-shot Learning with Online Self-Distillation
    Liu, Sihan
    Wang, Yue
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 1067 - 1070
  • [33] SSSD: Self-Supervised Self Distillation
    Chen, Wei-Chi
    Chu, Wei-Ta
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2769 - 2776
  • [34] SKILL: SIMILARITY-AWARE KNOWLEDGE DISTILLATION FOR SPEECH SELF-SUPERVISED LEARNING
    Zampierin, Luca
    Hacene, Ghouthi Boukli
    Nguyen, Bac
    Ravanelli, Mirco
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 675 - 679
  • [35] FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning
    Lee, Yeonghyeon
    Jang, Kangwook
    Goo, Jahyun
    Jung, Youngmoon
    Kim, Hoirin
    INTERSPEECH 2022, 2022, : 3588 - 3592
  • [36] Self-supervised network for oriented synthetic aperture radar ship detection based on self-distillation
    Li, Wentao
    Xu, Haixia
    Shi, Furong
    Yuan, Liming
    Wen, Xianbin
    JOURNAL OF APPLIED REMOTE SENSING, 2024, 18 (04)
  • [37] Speech SimCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning
    Jiang, Dongwei
    Li, Wubo
    Cao, Miao
    Zou, Wei
    Li, Xiangang
    INTERSPEECH 2021, 2021, : 1544 - 1548
  • [38] Self-Supervised Self-Organizing Clustering Network: A Novel Unsupervised Representation Learning Method
    Li, Shuo
    Liu, Fang
    Jiao, Licheng
    Chen, Puhua
    Li, Lingling
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 1857 - 1871
  • [39] Self-Supervised Relational Reasoning for Representation Learning
    Patacchiola, Massimiliano
    Storkey, Amos
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [40] Self-Supervised Learning for Specified Latent Representation
    Liu, Chicheng
    Song, Libin
    Zhang, Jiwen
    Chen, Ken
    Xu, Jing
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2020, 28 (01) : 47 - 59