DEEP NEURAL NETWORK TRAINED WITH SPEAKER REPRESENTATION FOR SPEAKER NORMALIZATION

被引:0
|
作者
Tang, Yun [1 ]
Mohan, Aanchan [2 ]
Rose, Richard C. [2 ]
Ma, Chengyuan [1 ]
机构
[1] Nuance Commun, Burlington, MA 01803 USA
[2] McGill Univ, Montreal, PQ, Canada
关键词
Neural networks; speaker adaptation; speaker normalization; TRANSFORMATIONS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A method for speaker normalization in deep neural network (DNN) based discriminative feature estimation for automatic speech recognition (ASR) is presented. This method is applied in the context of a DNN configured for auto-encoder based low dimensional bottleneck (AE-BN) feature extraction where the derived features are used as input to a continuous Gaussian density hidden Markov model (HMM/GMM) based ASR decoder. While AE-BN features are known to provide significant reduction in ASR word error rate (WER) with respect to more conventional spectral magnitude based features, there is no general agreement on how these networks can reduce the impact of speaker variability by incorporating prior knowledge of the speaker. An approach is presented in this paper where spectrum based DNN inputs are augmented with speaker inputs that are derived from separate regression based speaker transformations. It is shown the proposed method could reduce the WER by 3% relative to the best speaker adapted AE-BN CDHMM system.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Mixture Representation Learning for Deep Speaker Embedding
    Lin, Weiwei
    Mak, Man-Wai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 968 - 978
  • [32] Total Variability Layer in Deep Neural Network Embeddings for Speaker Verification
    Travadi, Ruchir
    Narayanan, Shrikanth
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (06) : 893 - 897
  • [33] Deep Neural Network Embeddings for Text-Independent Speaker Verification
    Snyder, David
    Garcia-Romero, Daniel
    Povey, Daniel
    Khudanpur, Sanjeev
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 999 - 1003
  • [34] Comparison of Regularization Constraints in Deep Neural Network based Speaker Adaptation
    Shen, Peng
    Lu, Xugang
    Kawai, Hisashi
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [35] Enhancing Speaker Diarization with Deep Neural Network Embeddings and Spectral Clustering
    Yanshan University, China
  • [36] A Deep Neural Network Speaker Verification System Targeting Microphone Speech
    Lei, Yun
    Ferrer, Luciana
    McLaren, Mitchell
    Scheffer, Nicolas
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 681 - 685
  • [37] Binary Neural Network for Speaker Verification
    Zhu, Tinglong
    Qin, Xiaoyi
    Li, Ming
    INTERSPEECH 2021, 2021, : 86 - 90
  • [38] Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
    Wang, Shuai
    Chen, Zhengyang
    Lee, Kong Aik
    Qian, Yanmin
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4971 - 4998
  • [39] Speaker Diarization Using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
    Cyrta, Pawel
    Trzcinski, Tomasz
    Stokowiec, Wojciech
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, PT I, 2018, 655 : 107 - 117
  • [40] UNSUPERVISED SPEAKER ADAPTATION OF DEEP NEURAL NETWORK BASED ON THE COMBINATION OF SPEAKER CODES AND SINGULAR VALUE DECOMPOSITION FOR SPEECH RECOGNITION
    Xue, Shaofei
    Jiang, Hui
    Dai, Lirong
    Liu, Qingfeng
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4555 - 4559