The IBM BOLT Speech Transcription System

被引:0
|
作者
Thomas, Samuel [1 ]
Saon, George [1 ]
Kuo, Hong-Kwang [1 ]
Mangu, Lidia [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
Automatic speech recognition; conversational telephone speech; deep neural networks; machine translation;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We describe the IBM automatic speech recognition (ASR) system for the DARPA Broad Operational Language Translation (BOLT) program. The system is used to transcribe conversational telephone speech (CTS) prior to machine translation for Phase 3 of the program's Activity A. The ASR system is a combination of novel sequence trained ensemble deep neural network acoustic models on speaker adapted features and convolutional neural network models on two kinds of spectro-temporal representations of speech, in conjunction with a variety of class, neural network and n-gram based language models. Acoustic and language models for the recognition system are built on transcribed audio released under the program and further optimized for the final machine translation task as well. The evaluation system has a word error rate of 32.7% on a 2 hour Egyptian Arabic development set for this task.
引用
收藏
页码:3150 / 3153
页数:4
相关论文
共 50 条
  • [1] The IBM mandarin broadcast speech transcription system
    Chu, Stephen M.
    Kuo, Hong-kwang
    Liu, Yi Y.
    Qin, Yong
    Shi, Qin
    Zweig, Geoffrey
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PTS 1-3, 2007, : 345 - +
  • [2] THE IBM 2009 GALE ARABIC SPEECH TRANSCRIPTION SYSTEM
    Kingsbury, Brian
    Soltau, Hagen
    Saon, George
    Chu, Stephen
    Kuo, Hong-Kwang
    Mangu, Lidia
    Ravuri, Suman
    Morgan, Nelson
    Janin, Adam
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4672 - 4675
  • [3] THE IBM 2008 GALE ARABIC SPEECH TRANSCRIPTION SYSTEM
    Saon, George
    Soltau, Hagen
    Chaudhari, Upendra
    Chu, Stephen
    Kingsbury, Brian
    Kuo, Hong-Kwang
    Mangu, Lidia
    Povey, Daniel
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4378 - 4381
  • [4] The IBM 2007 speech transcription system for European parliamentary speeches
    Ramabhadran, Bhuvana
    Siohan, Olivier
    Sethy, Abhinav
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 472 - +
  • [5] The IBM 2006 Speech Transcription System for European Parliamentary Speeches
    Ramabhadran, B.
    Siohan, O.
    Mangu, L.
    Zweig, G.
    Westphal, M.
    Schulz, H.
    Soneiro, A.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1225 - +
  • [6] IBM REPORTS PROGRESS IN SPEECH RECOGNITION AND TRANSCRIPTION
    MICHALOPOULOS, DA
    COMPUTER, 1980, 13 (09) : 89 - 90
  • [7] Recent improvements to IBM's speech recognition system for automatic transcription of broadcast news
    Chen, SS
    Eide, EM
    Gales, MJF
    Gopinath, RA
    Kanevsky, D
    Olsen, P
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 37 - 40
  • [8] Recent improvements to IBM's speech recognition system for automatic transcription of broadcast news
    Chen, S.S.
    Eide, E.M.
    Gales, M.J.F.
    Gopinath, R.A.
    Kanevsky, D.
    Olsen, P.
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 1 : 37 - 40
  • [9] The IBM rich transcription spring 2006 Speech-to-Text system for lecture meetings
    Huang, Jing
    Westphal, Martin
    Chen, Stanley
    Siohan, Olivier
    Povey, Daniel
    Libal, Vit
    Soneiro, Alvaro
    Schulz, Henrik
    Ross, Thomas
    Potamianos, Gerasimos
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 432 - +
  • [10] IBM GALE Mandarin transcription system
    Zhang, Shilei
    Shi, Qin
    Qin, Yong
    Liu, Wen
    Chu, Stephen-M
    Kuo, Hong-Kwang
    Mangu, Lidia
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2009, 49 (SUPPL. 1): : 1249 - 1253