Advanced Rich Transcription System for Estonian Speech

被引:23
|
作者
Alumae, Tanel [1 ]
Tilk, Ottokar [1 ]
Asadullah [1 ]
机构
[1] Tallinn Univ Technol, Lab Language Technol, Tallinn, Estonia
关键词
Speech recognition; Estonian; punctuation recovery; speaker identification;
D O I
10.3233/978-1-61499-912-6-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the current TTU speech transcription system for Estonian speech. The system is designed to handle semi-spontaneous speech, such as broadcast conversations, lecture recordings and interviews recorded in diverse acoustic conditions. The system is based on the Kaldi toolkit. Multi-condition training using background noise profiles extracted automatically from untranscribed data is used to improve the robustness of the system. Out-of-vocabulary words are recovered using a phoneme n-gram based decoding subgraph and a FST-based phoneme-to-grapheme model. The system achieves a word error rate of 8.1% on a test set of broadcast conversations. The system also performs punctuation recovery and speaker identification. Speaker identification models are trained using a recently proposed weakly supervised training method.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [41] PHONEMIA A PHONEME TRANSCRIPTION SYSTEM FOR SPEECH SYNTHESIS IN MODERN GREEK
    BAKAMIDIS, S
    CARAYANNIS, G
    SPEECH COMMUNICATION, 1987, 6 (02) : 159 - 169
  • [42] The IBM 2006 Speech Transcription System for European Parliamentary Speeches
    Ramabhadran, B.
    Siohan, O.
    Mangu, L.
    Zweig, G.
    Westphal, M.
    Schulz, H.
    Soneiro, A.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1225 - +
  • [43] Development of Transcription System Using Speech Recognition for Program Production
    Mishima T.
    Hagiwara A.
    Ito H.
    Komori T.
    Horikawa D.
    Kawase N.
    Sato S.
    1600, Inst. of Image Information and Television Engineers (74): : 729 - 735
  • [44] Prεεch: A System for Privacy-Preserving Speech Transcription
    Ahmed, Shimaa
    Chowdhury, Amrita Roy
    Fawaz, Kassem
    Ramanathan, Parmesh
    PROCEEDINGS OF THE 29TH USENIX SECURITY SYMPOSIUM, 2020, : 2703 - 2720
  • [45] STATISTICAL LANGUAGE MODEL ADAPTATION FOR ESTONIAN SPEECH RECOGNITION
    Alumaee, Tanel
    EESTI RAKENDUSLINGVISTIKA UHINGU AASTARAAMAT, 2008, 4 : 5 - 16
  • [46] DEVELOPMENTAL CHANGES IN ACOUSTIC CHARACTERISTICS OF SPEECH OF ESTONIAN ADOLESCENTS
    Meister, Einar
    Meister, Lya
    FOLKLORE-ELECTRONIC JOURNAL OF FOLKLORE, 2023, (90) : 179 - 206
  • [47] Shallow Parsing of Transcribed Speech of Estonian and Disfluency Detection
    Muurisep, Kaili
    Nigol, Helen
    HUMAN LANGUAGE TECHNOLOGY: CHALLENGES OF THE INFORMATION SOCIETY, 2009, 5603 : 165 - +
  • [48] Modelling speech temporal structure for Estonian text-to-speech synthesis: Feature selection
    Mihkla, Meelis
    TRAMES-JOURNAL OF THE HUMANITIES AND SOCIAL SCIENCES, 2007, 11 (03): : 284 - 298
  • [49] Creation of HMM-based Speech Model for Estonian Text-to-Speech Synthesis
    Nurk, Tonis
    HUMAN LANGUAGE TECHNOLOGIES: THE BALTIC PERSPECTIVE, 2012, 247 : 162 - 168
  • [50] Optimizing the perception of soft speech and speech in noise with the Advanced Bionics cochlear implant system
    Holden, Laura K.
    Reeder, Ruth M.
    Firszt, Jill B.
    Finley, Charles C.
    INTERNATIONAL JOURNAL OF AUDIOLOGY, 2011, 50 (04) : 255 - 269