Speaker Identification using Triplet Loss Function Combined with Clustering Techniques

被引:0
|
作者
Shalaby, Mohamed [1 ]
Hassan, Mohamed [1 ]
Omar, Yasser M. K. [1 ]
机构
[1] Arab Acad Sci & Technol, Dept Comp Sci, Cairo, Egypt
关键词
Neural network; Speech recognition; triplet loss function; RECOGNITION;
D O I
10.1109/ITMS52826.2021.9615342
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speaker identification plays a critical role in many applications like robotics specially the applications that focus on humanoid robotics. The speaker identification includes comparing unknown utterances against pre-stored utterances of speakers. In general, the encoded features are stored from the pre-known speakers database and 1:N comparisons between the extracted encoded features of the unknown utterances and the pre-stored N known speakers are implemented. Different techniques can be used for these types of comparisons of which cosine similarity is the most used one. However, the more the number of the pre-stored known speakers, the longer the execution time the model will need to finish these comparisons, and hence it may not be suitable for real-time applications. In this paper, we combined previously published Triple Neural Network for speaker identification with clustering techniques on the speakers dataset. We employed different clustering techniques and presented two different methods for comparing unknown utterances against pre-stored utterances. The obtained results showed a significant enhancement in the comparisons time with a few reductions in the obtained accuracy. The proposed approach provided a framework that can represent a trade-off between execution time and obtained accuracy.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Iterative Clustering Approach for Text Independent Speaker Identification using Multiple Features
    Revathi, A.
    Venkataramani, Y.
    ICSPCS: 2ND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, PROCEEDINGS, 2008, : 714 - +
  • [32] Speaker Identification using techniques based on one-shot learning
    Chica, Juan
    Salamea, Christian
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (64): : 101 - 108
  • [33] Person Re-identification Based on Feature Fusion and Triplet Loss Function
    Xiang, Jun
    Lin, Ranran
    Hou, Jianhua
    Huang, Wenjun
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3477 - 3482
  • [34] Combined Speaker Clustering and Role Recognition in Conversational Speech
    Flemotomos, Nikolaos
    Papadopoulos, Pavlos
    Gibson, James
    Narayanan, Shrikanth
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1378 - 1382
  • [35] Improving the loss function efficiency for speaker extraction using psychoacoustic effects
    Damirchi, Haleh
    Seyedin, Sanaz
    Ahadi, Seyed Mohammad
    APPLIED ACOUSTICS, 2021, 183
  • [36] SPEAKER CLUSTERING USING VECTOR QUANTIZATION AND SPECTRAL CLUSTERING
    Iso, Ken-ichi
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4986 - 4989
  • [37] Text-independent speaker identification by genetic clustering radial basis function neural network
    Yue, XC
    Ye, DT
    Liu, M
    PROCEEDINGS OF THE 23RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-4: BUILDING NEW BRIDGES AT THE FRONTIERS OF ENGINEERING AND MEDICINE, 2001, 23 : 1777 - 1780
  • [38] Speaker Clustering Using Dominant Sets
    Hibraj, Feliks
    Vascon, Sebastiano
    Stadelmann, Thilo
    Pelillo, Marcello
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3549 - 3554
  • [39] Facial Emotion Classification Using Deep Embedding with Triplet Loss Function
    Bircanoglu, Cenk
    Arica, Nafiz
    2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [40] Covariance-tied clustering method in speaker identification
    Wang, ZQ
    Liu, Y
    Ding, P
    Bo, X
    FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS, 2002, : 81 - 84