SELF-SUPERVISED SPEAKER RECOGNITION WITH LOSS-GATED LEARNING

被引:21
|
作者
Tao, Ruijie [1 ]
Lee, Kong Aik [2 ]
Das, Rohan Kumar [3 ]
Hautamaki, Ville [1 ,4 ]
Li, Haizhou [1 ,5 ]
机构
[1] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
[2] ASTAR, Inst Infocomm Res, Singapore, Singapore
[3] Fortemedia Singapore, Singapore, Singapore
[4] Univ Eastern Finland, Kuopio, Finland
[5] Chinese Univ Hong Kong, Shenzhen, Peoples R China
基金
新加坡国家研究基金会;
关键词
self-supervised speaker recognition; pseudo label selection; loss-gated learning;
D O I
10.1109/ICASSP43922.2022.9747162
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In self-supervised learning for speaker recognition, pseudo labels are useful as the supervision signals. It is a known fact that a speaker recognition model doesn't always benefit from pseudo labels due to their unreliability. In this work, we observe that a speaker recognition network tends to model the data with reliable labels faster than those with unreliable labels. This motivates us to study a loss-gated learning (LGL) strategy, which extracts the reliable labels through the fitting ability of the neural network during training With the proposed LGL, our speaker recognition model obtains a 46.3% performance gain over the system without it. Further, the proposed self-supervised speaker recognition with LGL trained on the VoxCeleb2 dataset without any labels achieves an equal error rate of 1.66% on the VoxCelebl original test set.
引用
收藏
页码:6142 / 6146
页数:5
相关论文
共 50 条
  • [21] SELF-SUPERVISED METRIC LEARNING WITH GRAPH CLUSTERING FOR SPEAKER DIARIZATION
    Singh, Prachi
    Ganapathy, Sriram
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 90 - 97
  • [22] Augmentation Adversarial Training for Self-Supervised Speaker Representation Learning
    Kang, Jingu
    Huh, Jaesung
    Heo, Hee Soo
    Chung, Joon Son
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1253 - 1262
  • [23] Supervised and Self-Supervised Learning for Assembly Line Action Recognition
    Indris, Christopher
    Ibrahim, Fady
    Ibrahem, Hatem
    Bramesfeld, Gotz
    Huo, Jie
    Ahmad, Hafiz Mughees
    Hayat, Syed Khizer
    Wang, Guanghui
    JOURNAL OF IMAGING, 2025, 11 (01)
  • [24] Incorporating Visual Information in Audio Based Self-Supervised Speaker Recognition
    Cai, Danwei
    Wang, Weiqing
    Li, Ming
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1422 - 1435
  • [25] SELF-SUPERVISED SPEAKER VERIFICATION WITH SIMPLE SIAMESE NETWORK AND SELF-SUPERVISED REGULARIZATION
    Sang, Mufan
    Li, Haoqi
    Liu, Fang
    Arnold, Andrew O.
    Wan, Li
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6127 - 6131
  • [26] Contrastive Self-Supervised Learning for Skeleton Action Recognition
    Gao, Xuehao
    Yang, Yang
    Du, Shaoyi
    NEURIPS 2020 WORKSHOP ON PRE-REGISTRATION IN MACHINE LEARNING, VOL 148, 2020, 148 : 51 - 61
  • [27] Self-supervised representation learning for surgical activity recognition
    Paysan, Daniel
    Haug, Luis
    Bajka, Michael
    Oelhafen, Markus
    Buhmann, Joachim M.
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2021, 16 (11) : 2037 - 2044
  • [28] Self-Supervised Learning for Action Recognition by Video Denoising
    Thi Thu Trang Phung
    Thi Hong Thu Ma
    Van Truong Nguyen
    Duc Quang Vu
    2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 76 - 81
  • [29] Self-Supervised ECG Representation Learning for Emotion Recognition
    Sarkar, Pritam
    Etemad, Ali
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (03) : 1541 - 1554
  • [30] Transferable Self-Supervised Instance Learning for Sleep Recognition
    Zhao, Aite
    Wang, Yue
    Li, Jianbo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4464 - 4477