Automatic speaker recognition with crosslanguage speech material

被引：10

作者：

Kuenzel, Hermann J. ^{[1
]}

机构：

[1] Univ Marburg, D-35032 Marburg, Germany

来源：

INTERNATIONAL JOURNAL OF SPEECH LANGUAGE AND THE LAW | 2013年 / 20卷 / 01期

关键词：

FORENSIC SPEAKER RECOGNITION; AUTOMATIC SPEAKER RECOGNITION; CROSS-LANGUAGE SPEECH MATERIAL; TRANSMISSION CHANNEL CHARACTERISTICS;

D O I：

10.1558/ijsll.v20i1.21

中图分类号：

DF [法律]; D9 [法律];

学科分类号：

0301 ;

摘要：

Automatic systems for forensic speaker recognition (FASR) claim to be largely independent of language based on the fact that feature vectors are composed of acoustic parameters that are derived from the resonance characteristics of vocal tract cavities. Yet a certain 'language gap' may remain which may deteriorate the performance of a system unless properly compensated. This forensic aspect of what may be called cross-language speaker recognition has not yet received due attention. Based on the most common forensic cross-language setting, the aim of this study was to assess the effect of language mismatch on the performance of a standard FASR system and compare its magnitude with the effect of other sources of mismatch on the same voice data. Using the automatic system Batvox 3 in an experiment with 75 bilingual speakers of seven languages and four kinds of transmission channels, it can be shown that, if speaker model and reference population are matched in terms of language, the remaining mismatch between speaker model and test sample can be neglected, since equal error rates (EERs) for same-language or cross-language comparisons are approximately the same, ranging from zero to 5.6%. Transmission of the speech data via landline telephone, GSM and, for part of the corpus, VoIP (using Skype) caused EERs to rise by less than 1% on average.

引用

页码：21 / 44

页数：24

共 50 条

[31] COMBINING SPEAKER AND NOISE FEATURE NORMALIZATION TECHNIQUES FOR AUTOMATIC SPEECH RECOGNITION
Garcia, L.
Benitez, C.
Segura, J. C.
Umesh, S.
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5496 - 5499
[32] AN INVESTIGATION OF SUBSPACE MODELING FOR PHONETIC AND SPEAKER VARIABILITY IN AUTOMATIC SPEECH RECOGNITION
Rose, Richard
Yin, Shou-Chun
Tang, Yun
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4508 - 4511
[33] Speech variability in automatic speaker recognition systems for commercial and forensic purposes
Ortega-García, J
González-Rodríguez, J
Cruz-Llanas, S
IEEE AEROSPACE AND ELECTRONIC SYSTEMS MAGAZINE, 2000, 15 (11) : 27 - 32
[34] Automatic speech recognition fusion approach to unsupervised speaker clustering and labeling
Lawson, A. D.
Huggins, M. C.
Grieco, J. J.
Galligan, S. A.
Harris, D. M.
2006 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2006, : 3280 - 3285
[35] Automatic speaker recognition
Moon, M. M.
Cheeran, Alice
PROCEEDINGS OF THE FOURTH IASTED INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, AND SYSTEMS, 2006, : 287 - +
[36] SPEAKER ADAPTED BEAMFORMING FOR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION
Menne, Tobias
Schlueter, Ralf
Ney, Hermann
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 535 - 541
[37] RESEARCH ON INDIVIDUALITY FEATURES IN SPEECH WAVES AND AUTOMATIC SPEAKER RECOGNITION TECHNIQUES
FURUI, S
SPEECH COMMUNICATION, 1986, 5 (02) : 183 - 197
[38] A NEW APPROACH TO SPEAKER ADAPTATION BY MODELING PRONUNCIATION IN AUTOMATIC SPEECH RECOGNITION
SCHIEL, F
SPEECH COMMUNICATION, 1993, 13 (3-4) : 281 - 286
[39] MEASURING OF THE CONTOURS OF INTENSITY AND FUNDAMENTAL PERIOD OF SPEECH FOR AUTOMATIC SPEAKER RECOGNITION
NEY, H
FREQUENZ, 1981, 35 (10) : 265 - 270
[40] SPEAKER-ENSEMBLE HIDDEN MARKOV MODELING FOR AUTOMATIC SPEECH RECOGNITION
Ye, Guoli
Mak, Brian
2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 6 - 10

← 1 2 3 4 5 →