Improving Acoustic Models for Russian Spontaneous Speech Recognition

被引:10
|
作者
Prudnikov, Alexey [1 ,2 ]
Medennikov, Ivan [2 ,3 ]
Mendelev, Valentin [1 ]
Korenevsky, Maxim [1 ,2 ]
Khokhlov, Yuri [3 ]
机构
[1] Speech Technol Ctr Ltd, St Petersburg, Russia
[2] ITMO Univ, St Petersburg, Russia
[3] STC Innovat Ltd, St Petersburg, Russia
来源
关键词
Speech recognition; Russian spontaneous speech; Deep neural networks; Speaker adaptation; I-vectors; Bottleneck features; ADAPTATION;
D O I
10.1007/978-3-319-23132-7_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The aim of the paper is to investigate the ways to improve acoustic models for Russian spontaneous speech recognition. We applied the main steps of the Kaldi Switchboard recipe to a Russian dataset but obtained low accuracy with respect to the results for English spontaneous telephone speech. We found two methods to be especially useful for Russian spontaneous speech: the i-vector based deep neural network adaptation and speaker-dependent bottleneck features which provide 8.6% and 11.9% relative word error rate reduction over the baseline system respectively.
引用
收藏
页码:234 / 242
页数:9
相关论文
共 50 条
  • [21] Improving language models for radiology speech recognition
    Paulett, John M.
    Langlotz, Curtis P.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (01) : 53 - 58
  • [22] Acoustic Analysis and Automatic Recognition of Spontaneous Children's Speech
    Gerosa, M.
    Giuliani, D.
    Narayanan, S.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1886 - +
  • [23] Improving Speech Recognition for the Elderly: A New Corpus of Elderly Japanese Speech and Investigation of Acoustic Modeling for Speech Recognition
    Fukuda, Meiko
    Nishizaki, Hiromitsu
    Iribe, Yurie
    Nishimura, Ryota
    Kitaoka, Norihide
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6578 - 6585
  • [24] Improving Acoustic Models in TORGO Dysarthric Speech Database
    Joy, Neethu Mariam
    Umesh, S.
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2018, 26 (03) : 637 - 645
  • [25] On Improving Acoustic Models For TORGO Dysarthric Speech Database
    Joy, Neethu Mariam
    Umesh, S.
    Abraham, Basil
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2695 - 2699
  • [26] Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition
    Toyama, Shohei
    Saito, Daisuke
    Minematsu, Nobuaki
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 543 - 547
  • [27] Online Generation of Acoustic Models for Multilingual Speech Recognition
    Raab, Martin
    Aradilla, Guillermo
    Gruhn, Rainer
    Noeth, Elmar
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2979 - +
  • [28] Boosting acoustic models in large vocabulary speech recognition
    Meyer, C
    Schramm, H
    PROCEEDINGS OF THE SIXTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, 2004, : 255 - 260
  • [29] Efficient Sparse Banded Acoustic Models for Speech Recognition
    Zhang, Weibin
    Fung, Pascale
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (03) : 280 - 283
  • [30] IMPROVING HYBRID CTC/ATTENTION END-TO-END SPEECH RECOGNITION WITH PRETRAINED ACOUSTIC AND LANGUAGE MODELS
    Deng, Keqi
    Cao, Songjun
    Zhang, Yike
    Ma, Long
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 76 - 82