DEEP NEURAL NETWORK TRAINED WITH SPEAKER REPRESENTATION FOR SPEAKER NORMALIZATION

被引:0
|
作者
Tang, Yun [1 ]
Mohan, Aanchan [2 ]
Rose, Richard C. [2 ]
Ma, Chengyuan [1 ]
机构
[1] Nuance Commun, Burlington, MA 01803 USA
[2] McGill Univ, Montreal, PQ, Canada
关键词
Neural networks; speaker adaptation; speaker normalization; TRANSFORMATIONS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A method for speaker normalization in deep neural network (DNN) based discriminative feature estimation for automatic speech recognition (ASR) is presented. This method is applied in the context of a DNN configured for auto-encoder based low dimensional bottleneck (AE-BN) feature extraction where the derived features are used as input to a continuous Gaussian density hidden Markov model (HMM/GMM) based ASR decoder. While AE-BN features are known to provide significant reduction in ASR word error rate (WER) with respect to more conventional spectral magnitude based features, there is no general agreement on how these networks can reduce the impact of speaker variability by incorporating prior knowledge of the speaker. An approach is presented in this paper where spectrum based DNN inputs are augmented with speaker inputs that are derived from separate regression based speaker transformations. It is shown the proposed method could reduce the WER by 3% relative to the best speaker adapted AE-BN CDHMM system.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech
    Zajic, Zbynek
    Zelinka, Jan
    Mueller, Ludek
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 555 - 563
  • [22] Speaker diarization system using HXLPS and deep neural network
    Ramaiah, V. Subba
    Rao, R. Rajeswara
    ALEXANDRIA ENGINEERING JOURNAL, 2018, 57 (01) : 255 - 266
  • [23] Empowering Speaker Verification with Deep Convolutional Neural Network Vectors
    Hourri, Soufiane
    STUDIES IN INFORMATICS AND CONTROL, 2024, 33 (02): : 97 - 107
  • [24] Speaker Normalization and Model Selection of Combined Neural Networks
    Furlanello, C.
    Giuliani, D.
    Trentin, E.
    Merler, S.
    Connection Science, 9 (01):
  • [25] Maximum Gaussianality training for deep speaker vector normalization
    Cai, Yunqi
    Li, Lantian
    Abel, Andrew
    Zhu, Xiaoyan
    Wang, Dong
    Pattern Recognition, 2024, 145
  • [26] Maximum Gaussianality training for deep speaker vector normalization
    Cai, Yunqi
    Li, Lantian
    Abel, Andrew
    Zhu, Xiaoyan
    Wang, Dong
    PATTERN RECOGNITION, 2024, 145
  • [27] EXPLORATIONS IN SPEAKER NORMALIZATION
    OHALA, JJ
    HASEGAWA, Y
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1987, 81 : S19 - S19
  • [28] Speaker independent acoustic modeling using speaker normalization
    Ishii, J
    Fukada, T
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 97 - 100
  • [29] Capture inter-speaker information with a neural network for speaker identification
    Wang, L
    Chen, K
    Chi, HH
    IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL V, 2000, : 247 - 252
  • [30] A study on speaker normalization using vocal tract normalization and speaker adaptive training
    Welling, L
    Haeb-Umbach, R
    Aubert, X
    Haberland, N
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 797 - 800