DEEP NEURAL NETWORK TRAINED WITH SPEAKER REPRESENTATION FOR SPEAKER NORMALIZATION

被引：0

作者：

Tang, Yun ^{[1
]}

Mohan, Aanchan ^{[2
]}

Rose, Richard C. ^{[2
]}

Ma, Chengyuan ^{[1
]}

机构：

[1] Nuance Commun, Burlington, MA 01803 USA

[2] McGill Univ, Montreal, PQ, Canada

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

Neural networks; speaker adaptation; speaker normalization; TRANSFORMATIONS;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A method for speaker normalization in deep neural network (DNN) based discriminative feature estimation for automatic speech recognition (ASR) is presented. This method is applied in the context of a DNN configured for auto-encoder based low dimensional bottleneck (AE-BN) feature extraction where the derived features are used as input to a continuous Gaussian density hidden Markov model (HMM/GMM) based ASR decoder. While AE-BN features are known to provide significant reduction in ASR word error rate (WER) with respect to more conventional spectral magnitude based features, there is no general agreement on how these networks can reduce the impact of speaker variability by incorporating prior knowledge of the speaker. An approach is presented in this paper where spectrum based DNN inputs are augmented with speaker inputs that are derived from separate regression based speaker transformations. It is shown the proposed method could reduce the WER by 3% relative to the best speaker adapted AE-BN CDHMM system.

引用

页数：5

共 50 条

[21] Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech
Zajic, Zbynek
Zelinka, Jan
Mueller, Ludek
SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 555 - 563
[22] Speaker diarization system using HXLPS and deep neural network
Ramaiah, V. Subba
Rao, R. Rajeswara
ALEXANDRIA ENGINEERING JOURNAL, 2018, 57 (01) : 255 - 266
[23] Empowering Speaker Verification with Deep Convolutional Neural Network Vectors
Hourri, Soufiane
STUDIES IN INFORMATICS AND CONTROL, 2024, 33 (02): : 97 - 107
[24] Speaker Normalization and Model Selection of Combined Neural Networks
Furlanello, C.
Giuliani, D.
Trentin, E.
Merler, S.
Connection Science, 9 (01):
[25] Maximum Gaussianality training for deep speaker vector normalization
Cai, Yunqi
Li, Lantian
Abel, Andrew
Zhu, Xiaoyan
Wang, Dong
Pattern Recognition, 2024, 145
[26] Maximum Gaussianality training for deep speaker vector normalization
Cai, Yunqi
Li, Lantian
Abel, Andrew
Zhu, Xiaoyan
Wang, Dong
PATTERN RECOGNITION, 2024, 145
[27] EXPLORATIONS IN SPEAKER NORMALIZATION
OHALA, JJ
HASEGAWA, Y
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1987, 81 : S19 - S19
[28] Speaker independent acoustic modeling using speaker normalization
Ishii, J
Fukada, T
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 97 - 100
[29] Capture inter-speaker information with a neural network for speaker identification
Wang, L
Chen, K
Chi, HH
IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL V, 2000, : 247 - 252
[30] A study on speaker normalization using vocal tract normalization and speaker adaptive training
Welling, L
Haeb-Umbach, R
Aubert, X
Haberland, N
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 797 - 800

← 1 2 3 4 5 →