Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss

被引:8
|
作者
Chowdhury, Labib [1 ]
Zunair, Hasib [2 ]
Mohammed, Nabeel [1 ]
机构
[1] North South Univ, Dept Elect & Comp Engn, Dhaka 1229, Bangladesh
[2] Concordia Univ, Gina Cody Sch Engn & Comp Sci, Montreal, PQ H3G, Canada
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 21期
关键词
speaker recognition; speaker identification; margin loss; SincNet; inter dataset testing; biometric authentication; feature embedding;
D O I
10.3390/app10217522
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Speaker identification is gaining popularity, with notable applications in security, automation, and authentication. For speaker identification, deep-convolutional-network-based approaches, such as SincNet, are used as an alternative to i-vectors. Convolution performed by parameterized sinc functions in SincNet demonstrated superior results in this area. This system optimizes softmax loss, which is integrated in the classification layer that is responsible for making predictions. Since the nature of this loss is only to increase interclass distance, it is not always an optimal design choice for biometric-authentication tasks such as face and speaker recognition. To overcome the aforementioned issues, this study proposes a family of models that improve upon the state-of-the-art SincNet model. Proposed models AF-SincNet, Ensemble-SincNet, and ALL-SincNet serve as a potential successor to the successful SincNet model. The proposed models are compared on a number of speaker-recognition datasets, such as TIMIT and LibriSpeech, with their own unique challenges. Performance improvements are demonstrated compared to competitive baselines. In interdataset evaluation, the best reported model not only consistently outperformed the baselines and current prior models, but also generalized well on unseen and diverse tasks such as Bengali speaker recognition.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [21] Disentangled Representation Learning for Multilingual Speaker Recognition
    Nam, Kihyun
    Kim, Youkyum
    Huh, Jaesung
    Heo, Hee-Soo
    Jung, Jee-weon
    Chung, Joon Son
    INTERSPEECH 2023, 2023, : 5316 - 5320
  • [22] Mixture Representation Learning for Deep Speaker Embedding
    Lin, Weiwei
    Mak, Man-Wai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 968 - 978
  • [23] Contrastive Speaker Representation Learning with Hard Negative Sampling for Speaker Recognition
    Go, Changhwan
    Lee, Young Han
    Kim, Taewoo
    Park, Nam In
    Chun, Chanjun
    SENSORS, 2024, 24 (19)
  • [24] Speaker Representation Learning via Contrastive Loss with Maximal Speaker Separability
    Li, Zhe
    Mak, Man-Wai
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 962 - 967
  • [25] Fair Loss: Margin-Aware Reinforcement Learning for Deep Face Recognition
    Liu, Bingyu
    Deng, Weihong
    Zhong, Yaoyao
    Wang, Mei
    Hu, Jiani
    Tao, Xunqiang
    Huang, Yaohai
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10051 - 10060
  • [26] Robust facial expression recognition with global-local joint representation learning
    Chunxiao Fan
    Zhenxing Wang
    Jia Li
    Shanshan Wang
    Xiao Sun
    Multimedia Systems, 2023, 29 : 3069 - 3079
  • [27] Learning robust latent representation for discriminative regression
    Cui, Jinrong
    Zhu, Qi
    Wang, Ding
    Li, Zuoyong
    PATTERN RECOGNITION LETTERS, 2019, 117 : 193 - 200
  • [28] Robust facial expression recognition with global-local joint representation learning
    Fan, Chunxiao
    Wang, Zhenxing
    Li, Jia
    Wang, Shanshan
    Sun, Xiao
    MULTIMEDIA SYSTEMS, 2023, 29 (05) : 3069 - 3079
  • [29] Investigation on Joint Representation Learning for Robust Feature Extraction in Speech Emotion Recognition
    Luo, Danqing
    Zou, Yuexian
    Huang, Dongyan
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 152 - 156
  • [30] Minimum margin loss for deep face recognition
    Wei, Xin
    Wang, Hui
    Scotney, Bryan
    Wan, Huan
    PATTERN RECOGNITION, 2020, 97