Advanced Rich Transcription System for Estonian Speech

被引:23
|
作者
Alumae, Tanel [1 ]
Tilk, Ottokar [1 ]
Asadullah [1 ]
机构
[1] Tallinn Univ Technol, Lab Language Technol, Tallinn, Estonia
关键词
Speech recognition; Estonian; punctuation recovery; speaker identification;
D O I
10.3233/978-1-61499-912-6-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the current TTU speech transcription system for Estonian speech. The system is designed to handle semi-spontaneous speech, such as broadcast conversations, lecture recordings and interviews recorded in diverse acoustic conditions. The system is based on the Kaldi toolkit. Multi-condition training using background noise profiles extracted automatically from untranscribed data is used to improve the robustness of the system. Out-of-vocabulary words are recovered using a phoneme n-gram based decoding subgraph and a FST-based phoneme-to-grapheme model. The system achieves a word error rate of 8.1% on a test set of broadcast conversations. The system also performs punctuation recovery and speaker identification. Speaker identification models are trained using a recently proposed weakly supervised training method.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [1] Transcription System for Semi-Spontaneous Estonian Speech
    Alumaee, Tanel
    HUMAN LANGUAGE TECHNOLOGIES: THE BALTIC PERSPECTIVE, 2012, 247 : 10 - 17
  • [2] Estonian Speech Recognition and Transcription Editing Service
    Olev, Aivo
    Alumae, Tanel
    BALTIC JOURNAL OF MODERN COMPUTING, 2022, 10 (03): : 409 - 421
  • [3] Open source platform for Estonian speech transcription
    Olev, Aivo
    Alumae, Tanel
    LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [4] The 2003 ISL rich transcription system for conversational telephony speech
    Soltau, H
    Yu, H
    Metze, F
    Fügen, C
    Jin, Q
    Jou, SC
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 773 - 776
  • [5] Estonian Large Vocabulary Speech Recognition System for Radiology
    Alumaee, Tanel
    Meister, Einar
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, 2010, 219 : 33 - 38
  • [6] The IBM rich transcription spring 2006 Speech-to-Text system for lecture meetings
    Huang, Jing
    Westphal, Martin
    Chen, Stanley
    Siohan, Olivier
    Povey, Daniel
    Libal, Vit
    Soneiro, Alvaro
    Schulz, Henrik
    Ross, Thomas
    Potamianos, Gerasimos
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 432 - +
  • [7] Full-duplex Speech-to-text System for Estonian
    Alumaee, Tanel
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2014, 2014, 268 : 3 - 10
  • [8] A mandarin lecture speech transcription system for speech summarization
    Chan, Ho Yin
    Zhang, Justin Jian
    Fung, Pascale
    Cao, Lu
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 467 - 471
  • [9] The AMI system for the transcription of speech in meetings
    Hain, Thomas
    Burget, Lukas
    Dines, John
    Garau, Giulia
    Karafiat, Martin
    Lincoln, Mike
    Vepa, Jithendra
    Wan, Vincent
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 357 - +
  • [10] The IBM BOLT Speech Transcription System
    Thomas, Samuel
    Saon, George
    Kuo, Hong-Kwang
    Mangu, Lidia
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3150 - 3153