Robust model for speaker verification against session-dependent utterance variation

被引:0
|
作者
Matsui, T [1 ]
Aikawa, K
机构
[1] Inst Stat Math, Tokyo 1068569, Japan
[2] NTT Corp, NTT Commun Sci Labs, Tokyo 1008116, Japan
来源
关键词
speaker verification; speaker model; session dependent; utterance variation; handset dependent distortion;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper investigates a new method for creating robust speaker models to cope with inter-session variation of a speaker in a continuous HMM-based speaker verification system. The new method estimates session-independent parameters by decomposing inter-session variations into two distinct parts: session-dependent and -independent. The parameters of the speaker models are estimated using the speaker adaptive training algorithm in conjunction with the equalization of session-dependent variation. The resultant models capture the session-independent speaker characteristics more reliably than the conventional models and their discriminative power improves accordingly. Moreover we have made our models more invariant to handset variations in a public switched telephone network (PSTN) by focusing on session-dependent variation and handset-dependent distortion separately. Text-independent speech data recorded by 20 speakers in seven sessions over 16 months was used to evaluate the new approach. The proposed method reduces the error rate by 15% relatively. When compared with the popular cepstral mean normalization, the error rate is reduced by 24% relatively when the speaker models were recreated using speech data recorded in four or more sessions.
引用
收藏
页码:712 / 718
页数:7
相关论文
共 49 条