Improving Acoustic Models for Russian Spontaneous Speech Recognition

被引:10
|
作者
Prudnikov, Alexey [1 ,2 ]
Medennikov, Ivan [2 ,3 ]
Mendelev, Valentin [1 ]
Korenevsky, Maxim [1 ,2 ]
Khokhlov, Yuri [3 ]
机构
[1] Speech Technol Ctr Ltd, St Petersburg, Russia
[2] ITMO Univ, St Petersburg, Russia
[3] STC Innovat Ltd, St Petersburg, Russia
来源
关键词
Speech recognition; Russian spontaneous speech; Deep neural networks; Speaker adaptation; I-vectors; Bottleneck features; ADAPTATION;
D O I
10.1007/978-3-319-23132-7_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The aim of the paper is to investigate the ways to improve acoustic models for Russian spontaneous speech recognition. We applied the main steps of the Kaldi Switchboard recipe to a Russian dataset but obtained low accuracy with respect to the results for English spontaneous telephone speech. We found two methods to be especially useful for Russian spontaneous speech: the i-vector based deep neural network adaptation and speaker-dependent bottleneck features which provide 8.6% and 11.9% relative word error rate reduction over the baseline system respectively.
引用
收藏
页码:234 / 242
页数:9
相关论文
共 50 条
  • [41] Decision tree-based acoustic models for speech recognition
    Masami Akamine
    Jitendra Ajmera
    EURASIP Journal on Audio, Speech, and Music Processing, 2012
  • [42] Context-dependent acoustic models for Chinese speech recognition
    Ma, B
    Huang, TY
    Xu, B
    Zhang, XJ
    Qu, F
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 455 - 458
  • [43] SPEECH RECOGNITION - ACOUSTIC, PHONETIC AND FORMAL-LANGUAGE MODELS
    MERMELSTEIN, P
    LEVINSON, S
    BIOTELEMETRY, 1975, 2 (1-2) : 121 - 123
  • [44] Speech Recognition with Factorial-HMM Syllabic Acoustic Models
    Coro, Gianpaolo
    Cutugno, Francesco
    Caropreso, Fulvio
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1497 - 1500
  • [45] Boosting HMM acoustic models in large vocabulary speech recognition
    Meyer, C
    Schramm, H
    SPEECH COMMUNICATION, 2006, 48 (05) : 532 - 548
  • [46] Decision tree-based acoustic models for speech recognition
    Akamine, Masami
    Ajmera, Jitendra
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2012,
  • [47] Context-independent acoustic models for Thai speech recognition
    Kasuriya, S
    Kanokphara, S
    Thatphithakkul, N
    Cotsomrong, P
    Sunpethniyom, T
    IEEE INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2004 (ISCIT 2004), PROCEEDINGS, VOLS 1 AND 2: SMART INFO-MEDIA SYSTEMS, 2004, : 991 - 994
  • [48] A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition
    Xiao, Xiong
    Li, Jinyu
    Chng, Eng Siong
    Li, Haizhou
    Lee, Chin-Hui
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1158 - 1169
  • [49] Building DNN acoustic models for large vocabulary speech recognition
    Maas, Andrew L.
    Qi, Peng
    Xie, Ziang
    Hannun, Awni Y.
    Lengerich, Christopher T.
    Jurafsky, Daniel
    Ng, Andrew Y.
    COMPUTER SPEECH AND LANGUAGE, 2017, 41 : 195 - 213
  • [50] Acoustic Modelling for Speech Recognition: Hidden Markov Models and Beyond?
    Gales, M. J. F.
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 44 - 44