Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss

被引：8

作者：

Chowdhury, Labib ^{[1
]}

Zunair, Hasib ^{[2
]}

Mohammed, Nabeel ^{[1
]}

机构：

[1] North South Univ, Dept Elect & Comp Engn, Dhaka 1229, Bangladesh

[2] Concordia Univ, Gina Cody Sch Engn & Comp Sci, Montreal, PQ H3G, Canada

来源：

APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 21期

关键词：

speaker recognition; speaker identification; margin loss; SincNet; inter dataset testing; biometric authentication; feature embedding;

D O I：

10.3390/app10217522

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Speaker identification is gaining popularity, with notable applications in security, automation, and authentication. For speaker identification, deep-convolutional-network-based approaches, such as SincNet, are used as an alternative to i-vectors. Convolution performed by parameterized sinc functions in SincNet demonstrated superior results in this area. This system optimizes softmax loss, which is integrated in the classification layer that is responsible for making predictions. Since the nature of this loss is only to increase interclass distance, it is not always an optimal design choice for biometric-authentication tasks such as face and speaker recognition. To overcome the aforementioned issues, this study proposes a family of models that improve upon the state-of-the-art SincNet model. Proposed models AF-SincNet, Ensemble-SincNet, and ALL-SincNet serve as a potential successor to the successful SincNet model. The proposed models are compared on a number of speaker-recognition datasets, such as TIMIT and LibriSpeech, with their own unique challenges. Performance improvements are demonstrated compared to competitive baselines. In interdataset evaluation, the best reported model not only consistently outperformed the baselines and current prior models, but also generalized well on unseen and diverse tasks such as Bengali speaker recognition.

引用

页码：1 / 17

页数：17

共 50 条

[41] Robust joint learning network: improved deep representation learning for person re-identification
Tian, Yumin
Li, Qiang
Wang, Di
Wan, Bo
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (17) : 24187 - 24203
[42] Deep Morphological Anomaly Detection Based on Angular Margin Loss
Kim, Taehyeon
Hong, Eungi
Choe, Yoonsik
APPLIED SCIENCES-BASEL, 2021, 11 (14):
[43] Robust speaker verification using sparse representation on joint factor analysis
Yang, H., 2012, Science Press (37):
[44] Deep Metric Learning with Triplet-Margin-Center Loss for Sketch Face Recognition
Feng Y.
Wu F.
Ji Y.
Jing X.-Y.
Yu J.
IEICE Transactions on Information and Systems, 2020, E103D (11): : 2394 - 2397
[45] Deep Metric Learning with Triplet-Margin-Center Loss for Sketch Face Recognition
Feng, Yujian
Wu, Fei
Ji, Yimu
Jing, Xiao-Yuan
Yu, Jian
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (11): : 2394 - 2397
[46] Auditory Sparse Representation for Robust Speaker Recognition Based on Tensor Structure
Qiang Wu
Liqing Zhang
EURASIP Journal on Audio, Speech, and Music Processing, 2008
[47] Noise-robust feature based on sparse representation for speaker recognition
Qi, Hongzhuo
Metallurgical and Mining Industry, 2015, 7 (04): : 64 - 69
[48] Auditory Sparse Representation for Robust Speaker Recognition Based on Tensor Structure
Wu, Qiang
Zhang, Liqing
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2008, 2008 (1)
[49] Deep Margin-Sensitive Representation Learning for Cross-Domain Facial Expression Recognition
Li, Yingjian
Zhang, Zheng
Chen, Bingzhi
Lu, Guangming
Zhang, David
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1359 - 1373
[50] PRECISE ADJACENT MARGIN LOSS FOR DEEP FACE RECOGNITION
Wei, Xin
Wang, Hui
Scotney, Bryan
Wan, Huan
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3641 - 3645

← 1 2 3 4 5 →