OPTIMIZING NEURAL NETWORK EMBEDDINGS USING A PAIR-WISE LOSS FOR TEXT-INDEPENDENT SPEAKER VERIFICATION

被引：0

作者：

Dhamyal, Hira ^{[1
]}

Zhou, Tianyan ^{[1
]}

Raj, Bhiksha ^{[1
]}

Singh, Rita ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA

来源：

2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年

关键词：

quartet loss; embeddings; neural-networks; speaker verification; DISCRIMINANT-ANALYSIS;

D O I：

10.1109/asru46091.2019.9003794

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a new loss function called the "quartet" loss for the better optimization of the neural networks for matching tasks. For such tasks, where neural network embeddings are the key component, the optimization of the network for better embeddings is critical. The embeddings are required to be class discriminative, resulting in minimal inter-class variation and maximal intra-class variation even for unseen classes for better generalization of the network. The quartet loss explicitly computes the distance metric between pairs of inputs and increases the gap between the similarity score distributions between the same class pairs and the different class pairs. We evaluate on the speaker verification task and demonstrate the performance of the loss on our proposed neural network.

引用

页码：742 / 748

页数：7

共 50 条

[41] Neural network clustering technique for text-independent speaker identification
Nossair, Zaki B.
Zahorian, Stephen A.
Artificial Neural Networks in Engineering - Proceedings (ANNIE'94), 1994, 4 : 453 - 459
[42] Pseudo speaker models for text-independent speaker verification using rank threshold
Chiba University, Chiba, Japan
NLP-KE - Proc. Int. Conf. Nat. Lang. Process. Knowl. Eng., (265-268):
[43] Deep Speaker Feature Learning for Text-independent Speaker Verification
Li, Lantian
Chen, Yixiang
Shi, Zing
Tang, Zhiyuan
Wang, Dong
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1542 - 1546
[44] A Survey on Text-Dependent and Text-Independent Speaker Verification
Tu, Youzhi
Lin, Weiwei
Mak, Man-Wai
IEEE ACCESS, 2022, 10 : 99038 - 99049
[45] BOUNDARY DISCRIMINATIVE LARGE MARGIN COSINE LOSS FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
Li, Rongjin
Li, Na
Tuo, Deyi
Yu, Meng
Su, Dan
Yu, Dong
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6321 - 6325
[46] Text-Independent Speaker Verification Using Lightweight 3D Convolutional Neural Networks
Chen, Jyun-Yan
Jeng, Jin-Tsong
2024 INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND ENGINEERING, ICSSE 2024, 2024,
[47] Evolutionary Algorithm Enhanced Neural Architecture Search for Text-Independent Speaker Verification
Qu, Xiaoyang
Wang, Jianzong
Xiao, Jing
INTERSPEECH 2020, 2020, : 961 - 965
[48] Generalized locally recurrent probabilistic neural networks for text-independent speaker verification
Ganchev, T
Fakotakis, N
Tasoulis, DK
Vrahatis, MN
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 41 - 44
[49] RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification
Jung, Jee-weon
Heo, Hee-Soo
Kim, Ju-ho
Shim, Hye-jin
Yu, Ha-Jin
INTERSPEECH 2019, 2019, : 1268 - 1272
[50] Text-Independent Speaker Verification Using Rank Threshold in Large Number of Speaker Models
Okamoto, Haruka
Tsuge, Satoru
Abdelwahab, Amira
Nishida, Masafumi
Horiuchi, Yasuo
Kuroiwa, Shingo
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2319 - +

← 1 2 3 4 5 →