SELF-SUPERVISED SPEAKER RECOGNITION WITH LOSS-GATED LEARNING

被引:21
|
作者
Tao, Ruijie [1 ]
Lee, Kong Aik [2 ]
Das, Rohan Kumar [3 ]
Hautamaki, Ville [1 ,4 ]
Li, Haizhou [1 ,5 ]
机构
[1] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
[2] ASTAR, Inst Infocomm Res, Singapore, Singapore
[3] Fortemedia Singapore, Singapore, Singapore
[4] Univ Eastern Finland, Kuopio, Finland
[5] Chinese Univ Hong Kong, Shenzhen, Peoples R China
基金
新加坡国家研究基金会;
关键词
self-supervised speaker recognition; pseudo label selection; loss-gated learning;
D O I
10.1109/ICASSP43922.2022.9747162
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In self-supervised learning for speaker recognition, pseudo labels are useful as the supervision signals. It is a known fact that a speaker recognition model doesn't always benefit from pseudo labels due to their unreliability. In this work, we observe that a speaker recognition network tends to model the data with reliable labels faster than those with unreliable labels. This motivates us to study a loss-gated learning (LGL) strategy, which extracts the reliable labels through the fitting ability of the neural network during training With the proposed LGL, our speaker recognition model obtains a 46.3% performance gain over the system without it. Further, the proposed self-supervised speaker recognition with LGL trained on the VoxCeleb2 dataset without any labels achieves an equal error rate of 1.66% on the VoxCelebl original test set.
引用
收藏
页码:6142 / 6146
页数:5
相关论文
共 50 条
  • [1] Barlow Twins self-supervised learning for robust speaker recognition
    Mohammadamini, Mohammad
    Matrouf, Driss
    Bonastre, Jean-Francois
    Dowerah, Sandipana
    Serizel, Romain
    Jouvet, Denis
    INTERSPEECH 2022, 2022, : 4033 - 4037
  • [2] Gated Self-supervised Learning for Improving Supervised Learning
    Fuadi, Erland Hillman
    Ruslim, Aristo Renaldo
    Wardhana, Putu Wahyu Kusuma
    Yudistira, Novanto
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 611 - 615
  • [3] Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
    Chen, Sanyuan
    Wu, Yu
    Wang, Chengyi
    Liu, Shujie
    Chen, Zhuo
    Wang, Peidong
    Liu, Gang
    Li, Jinyu
    Wu, Jian
    Yu, Xiangzhan
    Wei, Furu
    INTERSPEECH 2022, 2022, : 3699 - 3703
  • [4] Curriculum learning for self-supervised speaker verification
    Heo, Hee-Soo
    Jung, Jee-weon
    Kang, Jingu
    Kwon, Youngki
    Kim, You Jin
    Lee, Bong-Jin
    Chung, Joon Son
    INTERSPEECH 2023, 2023, : 4693 - 4697
  • [5] Self-Supervised Learning for Online Speaker Diarization
    Chien, Jen-Tzung
    Luo, Sixun
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 2036 - 2042
  • [6] SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION
    Gat, Itai
    Aronowitz, Hagai
    Zhu, Weizhong
    Morais, Edmilson
    Hoory, Ron
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7342 - 7346
  • [7] ROBUST SPEAKER VERIFICATION WITH JOINT SELF-SUPERVISED AND SUPERVISED LEARNING
    Wang, Kai
    Zhang, Xiaolei
    Zhang, Miao
    Li, Yuguang
    Lee, Jaeyun
    Cho, Kiho
    Park, Sung-UN
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7637 - 7641
  • [8] A Novel Self-supervised Representation Learning Model for an Open-Set Speaker Recognition
    Ohi, Abu Quwsar
    Gavrilova, Marina L.
    COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL MANAGEMENT, CISIM 2023, 2023, 14164 : 270 - 282
  • [9] Self-supervised Speaker Diarization
    Dissen, Yehoshua
    Kreuk, Felix
    Keshet, Joseph
    INTERSPEECH 2022, 2022, : 4013 - 4017
  • [10] Contrastive Information Maximization Clustering for Self-Supervised Speaker Recognition
    Fatban, Abderrahim
    Alam, Jahangir
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 383 - 388