Correlation Networks for Speaker Normalization in Automatic Speech Recognition

被引：0

作者：

Sharon, Rini A. ^{[1
]}

Kothinti, Sandeep Reddy ^{[1
]}

Umesh, Srinivasan ^{[1
]}

机构：

[1] Indian Inst Technol Madras, Chennai, Tamil Nadu, India

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

Automatic speech recognition; Correlational Neural Networks; fMLLR; Multi-view; Common representation learning; speaker normalization; i-vectors; PSEUDO-FMLLR;

D O I：

10.21437/Interspeech.2018-1612

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose using common representation learning(CRL) for speaker normalization in automatic speech recognition (ASR). Conventional methods like feature space maximum likelihood linear regression (fMLLR) require two pass decode and their performance is often limited by the amount of data during test. While i-vectors do not require two-pass decode, a significant number of input frames are required for estimation. Hence, as an alternative, a regression model employing correlational neural networks (CorrNet) for multi-view CRL is proposed. In this approach, the CorrNet training methodology treats normalized and un-normalized features as two parallel views of the same speech data. Once trained, this network generates frame-wise fMLLR-like features, thus overcoming the limitations of fMLLR/i-vectors. The recognition accuracy using the proposed CorrNet-generated features is comparable with the i-vector model counterparts and significantly better than the un-normalized features like filterbank. With CorrNet-features, we get an absolute improvement in word error rate of 2.5% for TIMIT, 2.69% for WSJ84 and 3.2% for Switchboard-33hour over un-normalized features.

引用

页码：882 / 886

页数：5

共 50 条

[1] Improved automatic speech recognition through speaker normalization
Giuliani, D
Gerosa, M
Brugnara, F
COMPUTER SPEECH AND LANGUAGE, 2006, 20 (01): : 107 - 123
[2] COMBINING SPEAKER AND NOISE FEATURE NORMALIZATION TECHNIQUES FOR AUTOMATIC SPEECH RECOGNITION
Garcia, L.
Benitez, C.
Segura, J. C.
Umesh, S.
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5496 - 5499
[3] SPEAKER NORMALIZATION FOR AUTOMATIC WORD RECOGNITION
BOEHM, JF
WRIGHT, RD
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 49 (01): : 133 - &
[4] Speaker normalization for template based speech recognition
Demange, Sebastien
Van Compernolle, Dirk
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 560 - 563
[5] Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks
Nejadgholi, Isar
Seyyedsalehi, Seyyed Ali
NEURAL COMPUTING & APPLICATIONS, 2009, 18 (01): : 45 - 55
[6] Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks
Isar Nejadgholi
Seyyed Ali Seyyedsalehi
Neural Computing and Applications, 2009, 18 : 45 - 55
[7] ADAPTING TO THE SPEAKER IN AUTOMATIC SPEECH RECOGNITION
TALBOT, M
INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1987, 27 (04): : 449 - 457
[8] Fast speaker adaptation of artificial neural networks for automatic speech recognition
Dupont, S
Cheboub, L
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1795 - 1798
[9] Energy Normalization in Automatic Speech Recognition
Jakovljevic, Niksa
Janev, Marko
Pekar, Darko
Miskovic, Dragisa
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 341 - +
[10] Efficient Speaker and Noise Normalization for Robust Speech Recognition
Joshi, Vikas
Bilgi, Raghavendra
Umesh, S.
Benitez, C.
Garcia, L.
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2612 - 2615

← 1 2 3 4 5 →