Improving Acoustic Models for Russian Spontaneous Speech Recognition

被引:10
|
作者
Prudnikov, Alexey [1 ,2 ]
Medennikov, Ivan [2 ,3 ]
Mendelev, Valentin [1 ]
Korenevsky, Maxim [1 ,2 ]
Khokhlov, Yuri [3 ]
机构
[1] Speech Technol Ctr Ltd, St Petersburg, Russia
[2] ITMO Univ, St Petersburg, Russia
[3] STC Innovat Ltd, St Petersburg, Russia
来源
关键词
Speech recognition; Russian spontaneous speech; Deep neural networks; Speaker adaptation; I-vectors; Bottleneck features; ADAPTATION;
D O I
10.1007/978-3-319-23132-7_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The aim of the paper is to investigate the ways to improve acoustic models for Russian spontaneous speech recognition. We applied the main steps of the Kaldi Switchboard recipe to a Russian dataset but obtained low accuracy with respect to the results for English spontaneous telephone speech. We found two methods to be especially useful for Russian spontaneous speech: the i-vector based deep neural network adaptation and speaker-dependent bottleneck features which provide 8.6% and 11.9% relative word error rate reduction over the baseline system respectively.
引用
收藏
页码:234 / 242
页数:9
相关论文
共 50 条
  • [31] Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance
    Nakamura, Masanobu
    Iwano, Koji
    Furui, Sadaoki
    COMPUTER SPEECH AND LANGUAGE, 2008, 22 (02): : 171 - 184
  • [32] Acoustic Models for the Automatic Identification of Prosodic Boundaries in Spontaneous Speech
    Falcao Teixeira, Barbara Heloha
    Mittmann, Maryuale Malvessi
    REVISTA DE ESTUDOS DA LINGUAGEM, 2018, 26 (04) : 1455 - 1488
  • [33] A robust compensation strategy for extraneous acoustic variations in spontaneous speech recognition
    Jiang, H
    Deng, L
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (01): : 9 - 17
  • [34] Automatic Recognition of Spontaneous Emotions in Speech Using Acoustic and Lexical Features
    Truong, Khict P.
    Raaijmakers, Stephan
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, PROCEEDINGS, 2008, 5237 : 161 - +
  • [35] Filled Pauses and Lengthenings Detection Based on the Acoustic Features for the Spontaneous Russian Speech
    Verkhodanova, Vasilisa
    Shapranov, Vladimir
    SPEECH AND COMPUTER, 2014, 8773 : 227 - 234
  • [36] Improving speech intelligibility in cochlear implants using acoustic models
    Vijayalakshmi, P.
    Nagarajan, T.
    Mahadevan, Preethi
    WSEAS Transactions on Signal Processing, 2011, 7 (04): : 131 - 144
  • [37] LSTM-Based Language Models for Spontaneous Speech Recognition
    Medennikov, Ivan
    Bulusheva, Anna
    SPEECH AND COMPUTER, 2016, 9811 : 469 - 475
  • [38] Combining stochastic and linguistic language models for recognition of spontaneous speech
    Eckert, W
    Gallwitz, F
    Niemann, H
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 423 - 426
  • [39] Statistical Transformation of Language and Pronunciation Models for Spontaneous Speech Recognition
    Akita, Yuya
    Kawahara, Tatsuya
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1539 - 1549
  • [40] Multilingual acoustic models for the recognition of non-native speech
    Fischer, V
    Janke, E
    Kunzmann, S
    Ross, T
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 331 - 334