OPTIMIZING NEURAL NETWORK EMBEDDINGS USING A PAIR-WISE LOSS FOR TEXT-INDEPENDENT SPEAKER VERIFICATION

被引:0
|
作者
Dhamyal, Hira [1 ]
Zhou, Tianyan [1 ]
Raj, Bhiksha [1 ]
Singh, Rita [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
quartet loss; embeddings; neural-networks; speaker verification; DISCRIMINANT-ANALYSIS;
D O I
10.1109/asru46091.2019.9003794
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a new loss function called the "quartet" loss for the better optimization of the neural networks for matching tasks. For such tasks, where neural network embeddings are the key component, the optimization of the network for better embeddings is critical. The embeddings are required to be class discriminative, resulting in minimal inter-class variation and maximal intra-class variation even for unseen classes for better generalization of the network. The quartet loss explicitly computes the distance metric between pairs of inputs and increases the gap between the similarity score distributions between the same class pairs and the different class pairs. We evaluate on the speaker verification task and demonstrate the performance of the loss on our proposed neural network.
引用
收藏
页码:742 / 748
页数:7
相关论文
共 50 条
  • [21] Acoustic Feature Shuffling Network for Text-Independent Speaker Verification
    Li, Jin
    Fang, Xin
    Chu, Fan
    Gao, Tian
    Song, Yan
    Dai, Lirong
    INTERSPEECH 2022, 2022, : 4790 - 4794
  • [22] Efficient text-independent speaker verification with structural Gaussian mixture models and neural network
    Xiang, B
    Berger, T
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05): : 447 - 456
  • [23] Automatic text-independent speaker verification using convolutional deep belief network
    Rakhmanenko, I. A.
    Shelupanov, A. A.
    Kostyuchenko, E. Y.
    COMPUTER OPTICS, 2020, 44 (04) : 596 - +
  • [24] Text independent speaker verification using modular neural network
    Um, IT
    Won, JJ
    Kim, MH
    IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL VI, 2000, : 97 - 102
  • [25] Pseudo-Phoneme Label Loss for Text-Independent Speaker Verification
    Niu, Mengqi
    He, Liang
    Fang, Zhihua
    Zhao, Baowei
    Wang, Kai
    APPLIED SCIENCES-BASEL, 2022, 12 (15):
  • [27] Graphical models for text-independent speaker verification
    Sánchez-Soto, E
    Sigelle, M
    Chollet, G
    NONLINEAR SPEECH MODELING AND APPLICATIONS, 2005, 3445 : 410 - 415
  • [28] TEXT-INDEPENDENT SPEAKER VERIFICATION USING 3D CONVOLUTIONAL NEURAL NETWORKS
    Toifi, Amirsina
    Dawson, Jeremy
    Nasrabadi, Nasser M.
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
  • [29] Text-independent speaker verification using Support Vector Machines
    Kharroubi, J
    Chollet, G
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 4017 - 4017
  • [30] Language dependency in text-independent speaker verification
    Auckenthaler, R
    Carey, MJ
    Mason, JSD
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 441 - 444