The IBM 2006 Speech Transcription System for European Parliamentary Speeches

被引:0
|
作者
Ramabhadran, B. [1 ]
Siohan, O. [1 ]
Mangu, L. [1 ]
Zweig, G. [1 ]
Westphal, M. [2 ]
Schulz, H. [2 ]
Soneiro, A. [2 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] IBM Germany, EMEA Voice Technol Dev, Munich, Germany
关键词
speech recognition; automatic segmentation; cross-adaptation; randomized decision trees; TC-STAR;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
TC-STAR is an European Union funded speech to speech translation project to transcribe, translate and synthesize European Parliamentary Plenary Speeches (EPPS). This paper describes IBM's English and Spanish speech recognition systems submitted to the TC-STAR 2006 Evaluation. The technical advances in this submission include two different algorithms for automatic segmentation and speaker clustering of the input audio; a system architecture that is based on cross-adaptation across these two segmentation schemes and system combination through generation of an ensemble of systems using randomized decision tree state-tying; automatic punctuation of the speech recognition output; and the incorporation of an additional 35 hours of in-domain EPPS acoustic training data. These advances reduced the error rate by 30% relative over the best-performing system in the TC-STAR 2005 Evaluation on the 2006 English development test set, and produced one of the best performing systems on the 2006 evaluation in English with a word error rate of 8.3%.
引用
收藏
页码:1225 / +
页数:2
相关论文
共 50 条
  • [11] Recent improvements to IBM's speech recognition system for automatic transcription of broadcast news
    Chen, SS
    Eide, EM
    Gales, MJF
    Gopinath, RA
    Kanevsky, D
    Olsen, P
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 37 - 40
  • [12] Recent improvements to IBM's speech recognition system for automatic transcription of broadcast news
    Chen, S.S.
    Eide, E.M.
    Gales, M.J.F.
    Gopinath, R.A.
    Kanevsky, D.
    Olsen, P.
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 1 : 37 - 40
  • [13] Rules and Speeches: How Parliamentary Rules Affect Legislators' Speech-Making Behavior
    Giannetti, Daniela
    Pedrazzani, Andrea
    LEGISLATIVE STUDIES QUARTERLY, 2016, 41 (03) : 771 - 800
  • [14] Politicization and conflict in the relationship with the European Union: an analysis of Italian Prime Ministers' parliamentary speeches
    Salvati, Eugenio
    ITALIAN POLITICAL SCIENCE REVIEW-RIVISTA ITALIANA DI SCIENZA POLITICA, 2021, 51 (01) : 1 - 24
  • [15] IBM GALE Mandarin transcription system
    Zhang, Shilei
    Shi, Qin
    Qin, Yong
    Liu, Wen
    Chu, Stephen-M
    Kuo, Hong-Kwang
    Mangu, Lidia
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2009, 49 (SUPPL. 1): : 1249 - 1253
  • [16] Advances in speech transcription at IBM under the DARPA EARS program
    Chen, Stanley F.
    Kingsbury, Brian
    Mangu, Lidia
    Povey, Daniel
    Saon, George
    Soltau, Hagen
    Zweig, Geoffrey
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1596 - 1608
  • [17] Speech translation enhanced ASR for european parliament speeches on the influence of ASR performance on speech translation
    Stueker, Sebastian
    Paulik, Matthias
    Kolss, Muntsin
    Fuegen, Christian
    Waibel, Alex
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1293 - +
  • [18] The IBM 2006 gale Arabic ASR system
    Soltau, Hagen
    Saon, George
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 349 - +
  • [19] COSEGMENTATION IN THE IBM TEXT-TO-SPEECH SYSTEM
    PICKERING, JB
    PROCEEDINGS : INSTITUTE OF ACOUSTICS, VOL 8, PART 7: SPEECH & HEARING, 1986, 8 : 385 - 392
  • [20] Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program
    Soltau, Hagen
    Saon, George
    Kingsbury, Brian
    Kuo, Hong-Kwang Jeff
    Mangu, Lidia
    Povey, Daniel
    Emami, Ahmad
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (05): : 884 - 894