DEEP NEURAL NETWORK TRAINED WITH SPEAKER REPRESENTATION FOR SPEAKER NORMALIZATION

被引:0
|
作者
Tang, Yun [1 ]
Mohan, Aanchan [2 ]
Rose, Richard C. [2 ]
Ma, Chengyuan [1 ]
机构
[1] Nuance Commun, Burlington, MA 01803 USA
[2] McGill Univ, Montreal, PQ, Canada
关键词
Neural networks; speaker adaptation; speaker normalization; TRANSFORMATIONS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A method for speaker normalization in deep neural network (DNN) based discriminative feature estimation for automatic speech recognition (ASR) is presented. This method is applied in the context of a DNN configured for auto-encoder based low dimensional bottleneck (AE-BN) feature extraction where the derived features are used as input to a continuous Gaussian density hidden Markov model (HMM/GMM) based ASR decoder. While AE-BN features are known to provide significant reduction in ASR word error rate (WER) with respect to more conventional spectral magnitude based features, there is no general agreement on how these networks can reduce the impact of speaker variability by incorporating prior knowledge of the speaker. An approach is presented in this paper where spectrum based DNN inputs are augmented with speaker inputs that are derived from separate regression based speaker transformations. It is shown the proposed method could reduce the WER by 3% relative to the best speaker adapted AE-BN CDHMM system.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] LEARNING SPEAKER REPRESENTATION FOR NEURAL NETWORK BASED MULTICHANNEL SPEAKER EXTRACTION
    Zmolikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Higuchi, Takuya
    Ogawa, Atsunori
    Nakatani, Tomohiro
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 8 - 15
  • [2] Speaker normalization improvement by neural network parameter optimization
    Autiero, M
    Giuliani, D
    Rampone, S
    Tagliaferri, R
    INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 1999, 10 (06): : 1117 - 1135
  • [3] Deep Normalization for Speaker Vectors
    Cai, Yunqi
    Li, Lantian
    Abel, Andrew
    Zhu, Xiaoyan
    Wang, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 733 - 744
  • [4] Ensemble Speaker Modeling using Speaker Adaptive Training Deep Neural Network for Speaker Adaptation
    Li, Sheng
    Lu, Xugang
    Akita, Yuya
    Kawahara, Tatsuya
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2892 - 2896
  • [5] A Deep Neural Network Model for Speaker Identification
    Ye, Feng
    Yang, Jun
    APPLIED SCIENCES-BASEL, 2021, 11 (08):
  • [6] PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification
    Zheng, Siqi
    Suo, Hongbin
    Chen, Qian
    INTERSPEECH 2022, 2022, : 1431 - 1435
  • [7] Optimal trained artificial neural network for Telugu speaker diarization
    V. Sethuram
    Ande Prasad
    R. Rajeshwara Rao
    Evolutionary Intelligence, 2020, 13 : 631 - 648
  • [8] Optimal trained artificial neural network for Telugu speaker diarization
    Sethuram, V.
    Prasad, Ande
    Rao, R. Rajeshwara
    EVOLUTIONARY INTELLIGENCE, 2020, 13 (04) : 631 - 648
  • [9] DISCRIMINATIVELY TRAINED JOINT SPEAKER AND ENVIRONMENT REPRESENTATIONS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS
    Yin, Maofan
    Sivadas, Sunil
    Yu, Kai
    Ma, Bin
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5065 - 5069
  • [10] Deep Neural Networks with Batch Speaker Normalization for Intoxicated Speech Detection
    Wang, Weiqing
    Wu, Haiwei
    Li, Ming
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1323 - 1327