Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss

被引:8
|
作者
Chowdhury, Labib [1 ]
Zunair, Hasib [2 ]
Mohammed, Nabeel [1 ]
机构
[1] North South Univ, Dept Elect & Comp Engn, Dhaka 1229, Bangladesh
[2] Concordia Univ, Gina Cody Sch Engn & Comp Sci, Montreal, PQ H3G, Canada
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 21期
关键词
speaker recognition; speaker identification; margin loss; SincNet; inter dataset testing; biometric authentication; feature embedding;
D O I
10.3390/app10217522
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Speaker identification is gaining popularity, with notable applications in security, automation, and authentication. For speaker identification, deep-convolutional-network-based approaches, such as SincNet, are used as an alternative to i-vectors. Convolution performed by parameterized sinc functions in SincNet demonstrated superior results in this area. This system optimizes softmax loss, which is integrated in the classification layer that is responsible for making predictions. Since the nature of this loss is only to increase interclass distance, it is not always an optimal design choice for biometric-authentication tasks such as face and speaker recognition. To overcome the aforementioned issues, this study proposes a family of models that improve upon the state-of-the-art SincNet model. Proposed models AF-SincNet, Ensemble-SincNet, and ALL-SincNet serve as a potential successor to the successful SincNet model. The proposed models are compared on a number of speaker-recognition datasets, such as TIMIT and LibriSpeech, with their own unique challenges. Performance improvements are demonstrated compared to competitive baselines. In interdataset evaluation, the best reported model not only consistently outperformed the baselines and current prior models, but also generalized well on unseen and diverse tasks such as Bengali speaker recognition.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [31] Speaker recognition based on deep learning: An overview
    Bai, Zhongxin
    Zhang, Xiao-Lei
    NEURAL NETWORKS, 2021, 140 : 65 - 99
  • [32] Deep learning methods in speaker recognition: A review
    Sztahó D.
    Szaszák G.
    Beke A.
    Periodica polytechnica Electrical engineering and computer science, 2021, 65 (04): : 310 - 328
  • [33] Speaker Recognition with Deep Learning Approaches: A Review
    Alenizi, Abdulrahman S.
    Al-Karawi, Khamis A.
    PROCEEDINGS OF NINTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, VOL 5, ICICT 2024, 2024, 1000 : 481 - 499
  • [34] Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation
    Mitsui, Kentaro
    Koriyama, Tomoki
    Saruwatari, Hiroshi
    SPEECH COMMUNICATION, 2021, 132 : 132 - 145
  • [35] Deep Metric Learning with Angular Loss
    Wang, Jian
    Zhou, Feng
    Wen, Shilei
    Liu, Xiao
    Lin, Yuanqing
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2612 - 2620
  • [36] Angular Margin-Mining Softmax Loss for Face Recognition
    Lee, Jwajin
    Wang, Yooseung
    Cho, Sunyoung
    IEEE ACCESS, 2022, 10 : 43071 - 43080
  • [37] Inter-class angular margin loss for face recognition
    Sun, Jingna
    Yang, Wenming
    Gao, Riqiang
    Xue, Jing-Hao
    Liao, Qingmin
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 80
  • [38] QAMFACE: QUADRATIC ADDITIVE ANGULAR MARGIN LOSS FOR FACE RECOGNITION
    Zhao, He
    Shi, Yongjie
    Tong, Xin
    Ying, Xianghua
    Zha, Hongbin
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2141 - 2145
  • [39] RAPID JOINT SPEAKER AND NOISE COMPENSATION FOR ROBUST SPEECH RECOGNITION
    Chin, K. K.
    Xu, Haitian
    Gales, Mark J. F.
    Breslin, Catherine
    Knill, Kate
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5500 - 5503
  • [40] Robust joint learning network: improved deep representation learning for person re-identification
    Yumin Tian
    Qiang Li
    Di Wang
    Bo Wan
    Multimedia Tools and Applications, 2019, 78 : 24187 - 24203