Multimodal Speaker Identification Based on Text and Speech

被引:0
|
作者
Moschonas, Panagiotis [1 ]
Kotropoulos, Constantine [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece
来源
关键词
multimodal speaker identification; text; speech; probabilistic latent semantic indexing; Mel-frequency cepstral coefficients; nearest neighbor classifier; convex combination;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a novel method for speaker identification based on both speech utterances and their transcribed text. The transcribed text of each speaker's utterance is processed by the probabilistic latent semantic indexing (PLSI) that offers a powerful means to model each speaker's vocabulary employing a number of hidden topics, which are closely related to his/her identity, function, or expertise. Mel-frequency cepstral coefficients (MFCCs) axe extracted from each speech frame and their dynamic range is quantized to a number of predefined bins in order to compute MFCC local histograms for each speech utterance, which is time-aligned with the transcribed text. Two identity scores are independently computed by the PLSI applied to the text and the nearest neighbor classifier applied to the local MFCC histograms. It is demonstrated that a convex combination of the two scores is more accurate than the individual scores on speaker identification experiments conducted on broadcast news of the RT-03 MDE Training Data Text and Annotations corpus distributed by the Linguistic Data Consortium.
引用
收藏
页码:100 / 109
页数:10
相关论文
共 50 条
  • [41] Principal Component Based Classification for Text-Independent Speaker Identification
    Hanilci, Cemal
    Ertas, Figen
    2009 FIFTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING, COMPUTING WITH WORDS AND PERCEPTIONS IN SYSTEM ANALYSIS, DECISION AND CONTROL, 2010, : 39 - 42
  • [42] Learning Speaker Embedding from Text-to-Speech
    Cho, Jaejin
    Zelasko, Piotr
    Villalba, Jesus
    Watanabe, Shinji
    Dehak, Najim
    INTERSPEECH 2020, 2020, : 3256 - 3260
  • [43] Multimodal approach for speaker identification in news programs
    Martone, AF
    Taskiran, CM
    Delp, EJ
    STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2005, 2005, 5682 : 308 - 316
  • [44] Processing degraded speech for text dependent speaker verification
    Khonglah B.K.
    Bhukya R.K.
    Prasanna S.R.M.
    Bhukya, Ramesh K. (r.bhukya@iitg.ernet.in), 1600, Springer Science and Business Media, LLC (20): : 839 - 850
  • [45] A robust wavelet-based text-independent speaker identification
    Phung Trung Nghia
    Pham Viet Binh
    Nguyen Huu Thai
    Nguyen Thanh Ha
    Kumsawat, Prayoth
    ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL II, PROCEEDINGS, 2007, : 219 - 223
  • [46] I-vector Based Text-Independent Speaker Identification
    Liu, Tingting
    Kang, Kai
    Guan, Shengxiao
    2014 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2014, : 5420 - 5425
  • [47] MultiSpeech: Multi-Speaker Text to Speech with Transformer
    Chen, Mingjian
    Tan, Xu
    Ren, Yi
    Xu, Jin
    Sun, Hao
    Zhao, Sheng
    Qin, Tao
    INTERSPEECH 2020, 2020, : 4024 - 4028
  • [48] Text-independent speaker identification based on spectral weighting functions
    Ma, JY
    Gao, W
    AUDIO- AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, 1997, 1206 : 267 - 272
  • [49] Text and Language Independent Speaker Identification by GMM based i Vector
    Kanrar, Soumen
    Jaiswal, Naveen
    6TH INTERNATIONAL CONFERENCE ON COMPUTER & COMMUNICATION TECHNOLOGY (ICCCT-2015), 2015, : 95 - 100
  • [50] Robust speech features based on wavelet transform with application to speaker identification
    Hsieh, CT
    Lai, E
    Wang, YC
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2002, 149 (02): : 108 - 114