Speaker Identification using Triplet Loss Function Combined with Clustering Techniques

被引:0
|
作者
Shalaby, Mohamed [1 ]
Hassan, Mohamed [1 ]
Omar, Yasser M. K. [1 ]
机构
[1] Arab Acad Sci & Technol, Dept Comp Sci, Cairo, Egypt
关键词
Neural network; Speech recognition; triplet loss function; RECOGNITION;
D O I
10.1109/ITMS52826.2021.9615342
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speaker identification plays a critical role in many applications like robotics specially the applications that focus on humanoid robotics. The speaker identification includes comparing unknown utterances against pre-stored utterances of speakers. In general, the encoded features are stored from the pre-known speakers database and 1:N comparisons between the extracted encoded features of the unknown utterances and the pre-stored N known speakers are implemented. Different techniques can be used for these types of comparisons of which cosine similarity is the most used one. However, the more the number of the pre-stored known speakers, the longer the execution time the model will need to finish these comparisons, and hence it may not be suitable for real-time applications. In this paper, we combined previously published Triple Neural Network for speaker identification with clustering techniques on the speakers dataset. We employed different clustering techniques and presented two different methods for comparing unknown utterances against pre-stored utterances. The obtained results showed a significant enhancement in the comparisons time with a few reductions in the obtained accuracy. The proposed approach provided a framework that can represent a trade-off between execution time and obtained accuracy.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Hierarchical speaker identification using speaker clustering
    Sun, B
    Liu, WJ
    Zhong, QH
    2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 299 - 304
  • [2] EFFICIENT SPEAKER IDENTIFICATION USING DISTRIBUTIONAL SPEAKER MODEL CLUSTERING
    Apsingekar, Vijendra Raj
    De Leon, Phillip L.
    2008 42ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-4, 2008, : 1260 - 1264
  • [3] Robust Language And Speaker Identification Using Image Processing Techniques Combined With PCA
    Joshi, Deepak
    Upadhayay, Madhur Deo
    Joshi, Shiv Dutt
    2013 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICSC), 2013, : 213 - 218
  • [4] Triplet loss-based embeddings for forensic speaker identification in Spanish
    Emmanuel Maqueda
    Javier Alvarez-Jimenez
    Carlos Mena
    Ivan Meza
    Neural Computing and Applications, 2023, 35 : 18177 - 18186
  • [5] Triplet loss-based embeddings for forensic speaker identification in Spanish
    Maqueda, Emmanuel
    Alvarez-Jimenez, Javier
    Mena, Carlos
    Meza, Ivan
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (25): : 18177 - 18186
  • [6] Speaker Identification Using Bagging Techniques
    Indumathi, A.
    Chandra, E.
    2015 INTERNATIONAL CONFERENCE ON COMPUTERS, COMMUNICATIONS, AND SYSTEMS (ICCCS), 2015, : 223 - 229
  • [7] SPEAKER IDENTIFICATION AND CLUSTERING USING CONVOLUTIONAL NEURAL NETWORKS
    Lukic, Yanick
    Vogt, Carlo
    Durr, Oliver
    Stadelmann, Thilo
    2016 IEEE 26TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2016,
  • [8] Person re-identification by the asymmetric triplet and identification loss function
    Cheng, De
    Gong, Yihong
    Shi, Weiwei
    Zhang, Shizhou
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (03) : 3533 - 3550
  • [9] Person re-identification by the asymmetric triplet and identification loss function
    De Cheng
    Yihong Gong
    Weiwei Shi
    Shizhou Zhang
    Multimedia Tools and Applications, 2018, 77 : 3533 - 3550
  • [10] TRISTOUNET: TRIPLET LOSS FOR SPEAKER TURN EMBEDDING
    Bredin, Herve
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5430 - 5434