Multimodal Speaker Identification Based on Text and Speech

被引:0
|
作者
Moschonas, Panagiotis [1 ]
Kotropoulos, Constantine [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece
来源
关键词
multimodal speaker identification; text; speech; probabilistic latent semantic indexing; Mel-frequency cepstral coefficients; nearest neighbor classifier; convex combination;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a novel method for speaker identification based on both speech utterances and their transcribed text. The transcribed text of each speaker's utterance is processed by the probabilistic latent semantic indexing (PLSI) that offers a powerful means to model each speaker's vocabulary employing a number of hidden topics, which are closely related to his/her identity, function, or expertise. Mel-frequency cepstral coefficients (MFCCs) axe extracted from each speech frame and their dynamic range is quantized to a number of predefined bins in order to compute MFCC local histograms for each speech utterance, which is time-aligned with the transcribed text. Two identity scores are independently computed by the PLSI applied to the text and the nearest neighbor classifier applied to the local MFCC histograms. It is demonstrated that a convex combination of the two scores is more accurate than the individual scores on speaker identification experiments conducted on broadcast news of the RT-03 MDE Training Data Text and Annotations corpus distributed by the Linguistic Data Consortium.
引用
收藏
页码:100 / 109
页数:10
相关论文
共 50 条
  • [31] SPEAKER IDENTIFICATION WITH DISTANT MICROPHONE SPEECH
    Jin, Qin
    Li, Runxin
    Yang, Qian
    Laskowski, Kornel
    Schultz, Tanja
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4518 - 4521
  • [32] Speech activated telephony email reader (SATER) based on speaker verification and text-to-speech conversion
    Wu, CH
    Chen, JH
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1997, 43 (03) : 707 - 716
  • [33] Speaker Identification using Whispered Speech
    Jawarkar, Naresh P.
    Holambe, Raghunath S.
    Basu, Tapan Kumar
    2013 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT 2013), 2013, : 778 - 781
  • [34] Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data
    Huang, Wen-Chin
    Wu, Yi-Chiao
    Toda, Tomoki
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2995 - 2999
  • [35] SPEAKER INTONATION ADAPTATION FOR TRANSFORMING TEXT-TO-SPEECH SYNTHESIS SPEAKER IDENTITY
    Langarani, Mahsa Sadat Elyasi
    van Santen, Jan
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 116 - 123
  • [36] Speaker Re-identification with Speaker Dependent Speech Enhancement
    Shi, Yanpei
    Huang, Qiang
    Hain, Thomas
    INTERSPEECH 2020, 2020, : 1530 - 1534
  • [37] Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech
    Sarma, Biswajit Dev
    Das, Rohan Kumar
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 610 - 615
  • [38] Speaker identification based text to audio alignment for an audio retrieval system
    Roy, D
    Malamud, C
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1099 - 1102
  • [39] Text-independent speaker identification based on support vector machines
    He, Xin
    Liu, Chongqing
    Li, Jiegu
    Jisuanji Gongcheng/Computer Engineering, 2000, 26 (06): : 61 - 63
  • [40] Efficient Text Independent Speaker Identification Based on GFCC and CMN Methods
    Tazi, El Bachir
    Benabbou, Abderrahim
    Harti, Mostafa
    2012 INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS (ICMCS), 2012, : 90 - 95