Speaker Identification using Triplet Loss Function Combined with Clustering Techniques

被引:0
|
作者
Shalaby, Mohamed [1 ]
Hassan, Mohamed [1 ]
Omar, Yasser M. K. [1 ]
机构
[1] Arab Acad Sci & Technol, Dept Comp Sci, Cairo, Egypt
关键词
Neural network; Speech recognition; triplet loss function; RECOGNITION;
D O I
10.1109/ITMS52826.2021.9615342
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speaker identification plays a critical role in many applications like robotics specially the applications that focus on humanoid robotics. The speaker identification includes comparing unknown utterances against pre-stored utterances of speakers. In general, the encoded features are stored from the pre-known speakers database and 1:N comparisons between the extracted encoded features of the unknown utterances and the pre-stored N known speakers are implemented. Different techniques can be used for these types of comparisons of which cosine similarity is the most used one. However, the more the number of the pre-stored known speakers, the longer the execution time the model will need to finish these comparisons, and hence it may not be suitable for real-time applications. In this paper, we combined previously published Triple Neural Network for speaker identification with clustering techniques on the speakers dataset. We employed different clustering techniques and presented two different methods for comparing unknown utterances against pre-stored utterances. The obtained results showed a significant enhancement in the comparisons time with a few reductions in the obtained accuracy. The proposed approach provided a framework that can represent a trade-off between execution time and obtained accuracy.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Speaker Model Clustering for Efficient Speaker Identification in Large Population Applications
    Apsingekar, Vijendra Raj
    De Leon, Phillip L.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04): : 848 - 853
  • [22] Speaker adaptation for telephony data using speaker clustering
    Wu, C
    Lubensky, D
    Wang, ZH
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 768 - 771
  • [23] Text-Independent Speaker Verification Based on Triplet Loss
    He, Junjie
    He, Jing
    Zhu, Liangjin
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 2385 - 2388
  • [24] Pitch correlogram clustering for fast speaker identification
    Jhanwar, N
    Raina, AK
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (17) : 2640 - 2649
  • [25] Pitch correlogram clustering for fast speaker identification
    Jhanwar, N. (nitinj@danlawinc.com), 1600, Hindawi Publishing Corporation (2004):
  • [26] Pitch Correlogram Clustering for Fast Speaker Identification
    Nitin Jhanwar
    Ajay K. Raina
    EURASIP Journal on Advances in Signal Processing, 2004
  • [27] An application of fuzzy entropy clustering in speaker identification
    Tran, D
    Wagner, M
    PROCEEDINGS OF THE FIFTH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1 AND 2, 2000, : 215 - 218
  • [28] An application of fuzzy entropy clustering in speaker identification
    Tran, D
    Wagner, M
    PROCEEDINGS OF THE FIFTH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1 AND 2, 2000, : 228 - 231
  • [29] Evaluation of Beamforming Techniques for Speaker Identification
    Schmidt, Rene
    Hardt, Wolfram
    2020 1ST INTERNATIONAL CONFERENCE PROBLEMS OF INFORMATICS, ELECTRONICS, AND RADIO ENGINEERING (PIERE), 2020, : 23 - 27
  • [30] End-to-End Speaker Age and Height Estimation using Attention Mechanism and Triplet Loss
    Kaushik, Manav
    Pham, Van Tung
    Anh, Tran The
    Chng, Eng Siong
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 786 - 793