The IBM 2006 Speech Transcription System for European Parliamentary Speeches

被引:0
|
作者
Ramabhadran, B. [1 ]
Siohan, O. [1 ]
Mangu, L. [1 ]
Zweig, G. [1 ]
Westphal, M. [2 ]
Schulz, H. [2 ]
Soneiro, A. [2 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] IBM Germany, EMEA Voice Technol Dev, Munich, Germany
关键词
speech recognition; automatic segmentation; cross-adaptation; randomized decision trees; TC-STAR;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
TC-STAR is an European Union funded speech to speech translation project to transcribe, translate and synthesize European Parliamentary Plenary Speeches (EPPS). This paper describes IBM's English and Spanish speech recognition systems submitted to the TC-STAR 2006 Evaluation. The technical advances in this submission include two different algorithms for automatic segmentation and speaker clustering of the input audio; a system architecture that is based on cross-adaptation across these two segmentation schemes and system combination through generation of an ensemble of systems using randomized decision tree state-tying; automatic punctuation of the speech recognition output; and the incorporation of an additional 35 hours of in-domain EPPS acoustic training data. These advances reduced the error rate by 30% relative over the best-performing system in the TC-STAR 2005 Evaluation on the 2006 English development test set, and produced one of the best performing systems on the 2006 evaluation in English with a word error rate of 8.3%.
引用
收藏
页码:1225 / +
页数:2
相关论文
共 50 条
  • [31] Developing high performance ASR in the IBM multilingual speech-to-speech translation system
    Cui, Xiaodong
    Gu, Liang
    Xiang, Bing
    Zhang, Wei
    Gao, Yuqing
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5121 - 5124
  • [32] The IBM Speech Activity Detection System for the DARPA RATS Program
    Saon, George
    Thomas, Samuel
    Soltau, Hagen
    Ganapathy, Sriram
    Kingsbury, Brian
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3464 - 3468
  • [33] ESTIMATION OF PROBABILITIES IN THE LANGUAGE MODEL OF THE IBM SPEECH RECOGNITION SYSTEM
    NADAS, A
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (04): : 859 - 861
  • [34] The IBM 2015 English Conversational Telephone Speech Recognition System
    Saon, George
    Kuo, Hong-Kwang J.
    Rennie, Steven
    Picheny, Michael
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3140 - 3144
  • [35] Recent Advances of IBM's Handheld Speech Translation System
    Zhu, Weizhong
    Zhou, Bowen
    Prosser, Charles
    Krbec, Pavel
    Gao, Yuqing
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1181 - 1184
  • [36] The IBM 2016 English Conversational Telephone Speech Recognition System
    Saon, George
    Sercu, Tom
    Rennie, Steven
    Kuo, Hong-Kwang J.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 7 - 11
  • [37] Automatic transcription system for parliamentary debates in the context of assembly of the republic of Portugal
    Nascimento, Pedro
    Ferreira, João C.
    Batista, Fernando
    International Journal of Speech Technology, 2024, 27 (03) : 613 - 635
  • [38] The IRST English-Spanish Translation System for European Parliament Speeches
    Falavigna, Daniele
    Bertoldi, Nicola
    Brugnara, Fabio
    Cattoni, Roldano
    Cettolo, Mauro
    Chen, Boxing
    Federico, Marcello
    Giuliani, Diego
    Gretter, Roberto
    Gupta, Deepa
    Seppi, Dino
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1137 - 1140
  • [39] The 2005 AMI system for the transcription of speech in meetings
    Hain, T
    Burget, L
    Dines, J
    Garau, G
    Karafiat, M
    Lincoln, M
    McCowan, I
    Moore, D
    Wan, V
    Ordelman, R
    Renals, S
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 450 - 462
  • [40] Advanced Rich Transcription System for Estonian Speech
    Alumae, Tanel
    Tilk, Ottokar
    Asadullah
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2018, 2018, 307 : 1 - 8