Correlation Networks for Speaker Normalization in Automatic Speech Recognition

被引:0
|
作者
Sharon, Rini A. [1 ]
Kothinti, Sandeep Reddy [1 ]
Umesh, Srinivasan [1 ]
机构
[1] Indian Inst Technol Madras, Chennai, Tamil Nadu, India
关键词
Automatic speech recognition; Correlational Neural Networks; fMLLR; Multi-view; Common representation learning; speaker normalization; i-vectors; PSEUDO-FMLLR;
D O I
10.21437/Interspeech.2018-1612
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose using common representation learning(CRL) for speaker normalization in automatic speech recognition (ASR). Conventional methods like feature space maximum likelihood linear regression (fMLLR) require two pass decode and their performance is often limited by the amount of data during test. While i-vectors do not require two-pass decode, a significant number of input frames are required for estimation. Hence, as an alternative, a regression model employing correlational neural networks (CorrNet) for multi-view CRL is proposed. In this approach, the CorrNet training methodology treats normalized and un-normalized features as two parallel views of the same speech data. Once trained, this network generates frame-wise fMLLR-like features, thus overcoming the limitations of fMLLR/i-vectors. The recognition accuracy using the proposed CorrNet-generated features is comparable with the i-vector model counterparts and significantly better than the un-normalized features like filterbank. With CorrNet-features, we get an absolute improvement in word error rate of 2.5% for TIMIT, 2.69% for WSJ84 and 3.2% for Switchboard-33hour over un-normalized features.
引用
收藏
页码:882 / 886
页数:5
相关论文
共 50 条
  • [1] Improved automatic speech recognition through speaker normalization
    Giuliani, D
    Gerosa, M
    Brugnara, F
    COMPUTER SPEECH AND LANGUAGE, 2006, 20 (01): : 107 - 123
  • [2] COMBINING SPEAKER AND NOISE FEATURE NORMALIZATION TECHNIQUES FOR AUTOMATIC SPEECH RECOGNITION
    Garcia, L.
    Benitez, C.
    Segura, J. C.
    Umesh, S.
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5496 - 5499
  • [3] SPEAKER NORMALIZATION FOR AUTOMATIC WORD RECOGNITION
    BOEHM, JF
    WRIGHT, RD
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 49 (01): : 133 - &
  • [4] Speaker normalization for template based speech recognition
    Demange, Sebastien
    Van Compernolle, Dirk
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 560 - 563
  • [5] Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks
    Nejadgholi, Isar
    Seyyedsalehi, Seyyed Ali
    NEURAL COMPUTING & APPLICATIONS, 2009, 18 (01): : 45 - 55
  • [6] Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks
    Isar Nejadgholi
    Seyyed Ali Seyyedsalehi
    Neural Computing and Applications, 2009, 18 : 45 - 55
  • [7] ADAPTING TO THE SPEAKER IN AUTOMATIC SPEECH RECOGNITION
    TALBOT, M
    INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1987, 27 (04): : 449 - 457
  • [8] Fast speaker adaptation of artificial neural networks for automatic speech recognition
    Dupont, S
    Cheboub, L
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1795 - 1798
  • [9] Energy Normalization in Automatic Speech Recognition
    Jakovljevic, Niksa
    Janev, Marko
    Pekar, Darko
    Miskovic, Dragisa
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 341 - +
  • [10] Efficient Speaker and Noise Normalization for Robust Speech Recognition
    Joshi, Vikas
    Bilgi, Raghavendra
    Umesh, S.
    Benitez, C.
    Garcia, L.
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2612 - 2615