Advanced Rich Transcription System for Estonian Speech

被引:23
|
作者
Alumae, Tanel [1 ]
Tilk, Ottokar [1 ]
Asadullah [1 ]
机构
[1] Tallinn Univ Technol, Lab Language Technol, Tallinn, Estonia
关键词
Speech recognition; Estonian; punctuation recovery; speaker identification;
D O I
10.3233/978-1-61499-912-6-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the current TTU speech transcription system for Estonian speech. The system is designed to handle semi-spontaneous speech, such as broadcast conversations, lecture recordings and interviews recorded in diverse acoustic conditions. The system is based on the Kaldi toolkit. Multi-condition training using background noise profiles extracted automatically from untranscribed data is used to improve the robustness of the system. Out-of-vocabulary words are recovered using a phoneme n-gram based decoding subgraph and a FST-based phoneme-to-grapheme model. The system achieves a word error rate of 8.1% on a test set of broadcast conversations. The system also performs punctuation recovery and speaker identification. Speaker identification models are trained using a recently proposed weakly supervised training method.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [21] Adapting Audiovisual Speech Synthesis to Estonian
    Aller, Sven
    Fishel, Mark
    TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 13 - 23
  • [22] RULES FOR ESTONIAN SIGN LANGUAGE TRANSCRIPTION
    Paabo, Regina
    Foedisch, Monika
    Hollman, Liivi
    TRAMES-JOURNAL OF THE HUMANITIES AND SOCIAL SCIENCES, 2009, 13 (04): : 401 - 424
  • [23] The IBM rich transcription 2007 speech-to-text systems for lecture meetings
    Huang, Jing
    Marcheret, Etienne
    Visweswariah, Karthik
    Libal, Vit
    Potamianos, Gerasimos
    MULTIMODAL TECHNOLOGIES FOR PERCEPTION OF HUMANS, 2008, 4625 : 429 - 441
  • [24] 1998 HTK system for transcription of conversational telephone speech
    Hain, T.
    Woodland, P.C.
    Niesler, T.R.
    Whittaker, E.W.D.
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 1 : 57 - 60
  • [25] Intelligent transcription system based on spontaneous speech processing
    Kawahara, Tatsuya
    ICKS 2007: SECOND INTERNATIONAL CONFERENCE ON INFORMATICS RESEARCH FOR DEVELOPMENT OF KNOWLEDGE SOCIETY INFRASTRUCTURE, PROCEEDINGS, 2007, : 19 - 26
  • [26] THE IBM 2009 GALE ARABIC SPEECH TRANSCRIPTION SYSTEM
    Kingsbury, Brian
    Soltau, Hagen
    Saon, George
    Chu, Stephen
    Kuo, Hong-Kwang
    Mangu, Lidia
    Ravuri, Suman
    Morgan, Nelson
    Janin, Adam
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4672 - 4675
  • [27] Slovak Broadcast News Speech Recognition and Transcription System
    Lojka, Martin
    Viszlay, Peter
    Stas, Jan
    Hladek, Daniel
    Juhar, Jozef
    ADVANCES IN NETWORK-BASED INFORMATION SYSTEMS, NBIS-2018, 2019, 22 : 385 - 394
  • [28] THE IBM 2008 GALE ARABIC SPEECH TRANSCRIPTION SYSTEM
    Saon, George
    Soltau, Hagen
    Chaudhari, Upendra
    Chu, Stephen
    Kingsbury, Brian
    Kuo, Hong-Kwang
    Mangu, Lidia
    Povey, Daniel
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4378 - 4381
  • [29] The 1998 HTK system for transcription of conversational telephone speech
    Hain, T
    Woodland, PC
    Niesler, TR
    Whittaker, EWD
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 57 - 60
  • [30] The IBM 2004 conversational telephony system for rich transcription
    Soltau, H
    Kingsbury, B
    Mangu, L
    Povey, D
    Saon, G
    Zweig, G
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 205 - 208