The IBM BOLT Speech Transcription System

被引:0
|
作者
Thomas, Samuel [1 ]
Saon, George [1 ]
Kuo, Hong-Kwang [1 ]
Mangu, Lidia [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
Automatic speech recognition; conversational telephone speech; deep neural networks; machine translation;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We describe the IBM automatic speech recognition (ASR) system for the DARPA Broad Operational Language Translation (BOLT) program. The system is used to transcribe conversational telephone speech (CTS) prior to machine translation for Phase 3 of the program's Activity A. The ASR system is a combination of novel sequence trained ensemble deep neural network acoustic models on speaker adapted features and convolutional neural network models on two kinds of spectro-temporal representations of speech, in conjunction with a variety of class, neural network and n-gram based language models. Acoustic and language models for the recognition system are built on transcribed audio released under the program and further optimized for the final machine translation task as well. The evaluation system has a word error rate of 32.7% on a 2 hour Egyptian Arabic development set for this task.
引用
收藏
页码:3150 / 3153
页数:4
相关论文
共 50 条
  • [41] The 1998 HTK system for transcription of conversational telephone speech
    Hain, T
    Woodland, PC
    Niesler, TR
    Whittaker, EWD
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 57 - 60
  • [42] Robust speech recognition in noisy environments: The 2001 IBM SPINE evaluation system
    Kingsbury, B
    Saon, G
    Mangu, L
    Padmanabhan, M
    Sarikaya, R
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 53 - 56
  • [43] Phrase splicing and variable substitution using the IBM trainable speech synthesis system
    Donovan, RE
    Franz, M
    Sorensen, JS
    Roukos, S
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 373 - 376
  • [44] Transcription of broadcast news - Some recent improvements to IBM's LVCSR system
    Polymenakos, L
    Olsen, P
    Kanvesky, D
    Gopinath, RA
    Gopalakrishnan, PS
    Chen, S
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 901 - 904
  • [45] Super-Human Multi-Talker Speech Recognition: The IBM 2006 Speech Separation Challenge System
    Kristjansson, T.
    Hershey, J.
    Olsen, P.
    Rennie, S.
    Gopinath, R.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 97 - 100
  • [46] SPEECH IMAGES ON THE IBM-PC
    COTE, AJ
    BYTE, 1983, 8 (11): : 402 - &
  • [47] IBM MASTOR: Multilingual automatic speech-to-speech translator
    Gao, Yuqing
    Zhou, Bowen
    Gu, Liang
    Sarikaya, Ruhi
    Kuo, Hong-kwang
    Rosti, A-V I.
    Afify, Mohamed
    Zhu, Weizhong
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 6063 - 6066
  • [48] CRIM'S FRENCH SPEECH TRANSCRIPTION SYSTEM FOR ETAPE 2011
    Gupta, Vishwa
    Boulianne, Gilles
    Osterrath, Frederic
    Ouellet, Pierre
    2013 8TH INTERNATIONAL WORKSHOP ON SYSTEMS, SIGNAL PROCESSING AND THEIR APPLICATIONS (WOSSPA), 2013, : 351 - 356
  • [49] The 2003 ISL rich transcription system for conversational telephony speech
    Soltau, H
    Yu, H
    Metze, F
    Fügen, C
    Jin, Q
    Jou, SC
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 773 - 776
  • [50] System for speech transcription and post-editing in Microsoft Word
    Salimbajevs, Askars
    Ikauniece, Indra
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 825 - 826