The IBM BOLT Speech Transcription System

被引:0
|
作者
Thomas, Samuel [1 ]
Saon, George [1 ]
Kuo, Hong-Kwang [1 ]
Mangu, Lidia [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
Automatic speech recognition; conversational telephone speech; deep neural networks; machine translation;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We describe the IBM automatic speech recognition (ASR) system for the DARPA Broad Operational Language Translation (BOLT) program. The system is used to transcribe conversational telephone speech (CTS) prior to machine translation for Phase 3 of the program's Activity A. The ASR system is a combination of novel sequence trained ensemble deep neural network acoustic models on speaker adapted features and convolutional neural network models on two kinds of spectro-temporal representations of speech, in conjunction with a variety of class, neural network and n-gram based language models. Acoustic and language models for the recognition system are built on transcribed audio released under the program and further optimized for the final machine translation task as well. The evaluation system has a word error rate of 32.7% on a 2 hour Egyptian Arabic development set for this task.
引用
收藏
页码:3150 / 3153
页数:4
相关论文
共 50 条
  • [21] The AMI system for the transcription of speech in meetings
    Hain, Thomas
    Burget, Lukas
    Dines, John
    Garau, Giulia
    Karafiat, Martin
    Lincoln, Mike
    Vepa, Jithendra
    Wan, Vincent
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 357 - +
  • [22] Developing high performance ASR in the IBM multilingual speech-to-speech translation system
    Cui, Xiaodong
    Gu, Liang
    Xiang, Bing
    Zhang, Wei
    Gao, Yuqing
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5121 - 5124
  • [23] The IBM Speech Activity Detection System for the DARPA RATS Program
    Saon, George
    Thomas, Samuel
    Soltau, Hagen
    Ganapathy, Sriram
    Kingsbury, Brian
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3464 - 3468
  • [24] ESTIMATION OF PROBABILITIES IN THE LANGUAGE MODEL OF THE IBM SPEECH RECOGNITION SYSTEM
    NADAS, A
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (04): : 859 - 861
  • [25] The IBM 2015 English Conversational Telephone Speech Recognition System
    Saon, George
    Kuo, Hong-Kwang J.
    Rennie, Steven
    Picheny, Michael
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3140 - 3144
  • [26] Recent Advances of IBM's Handheld Speech Translation System
    Zhu, Weizhong
    Zhou, Bowen
    Prosser, Charles
    Krbec, Pavel
    Gao, Yuqing
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1181 - 1184
  • [27] The IBM 2016 English Conversational Telephone Speech Recognition System
    Saon, George
    Sercu, Tom
    Rennie, Steven
    Kuo, Hong-Kwang J.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 7 - 11
  • [28] The 2005 AMI system for the transcription of speech in meetings
    Hain, T
    Burget, L
    Dines, J
    Garau, G
    Karafiat, M
    Lincoln, M
    McCowan, I
    Moore, D
    Wan, V
    Ordelman, R
    Renals, S
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 450 - 462
  • [29] Advanced Rich Transcription System for Estonian Speech
    Alumae, Tanel
    Tilk, Ottokar
    Asadullah
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2018, 2018, 307 : 1 - 8
  • [30] An Automatic Speech Transcription System for Manipuri Language
    Patel, Tanvina
    Krishna, D. N.
    Fathima, Noor
    Shah, Nisar
    Mahima, C.
    Kumar, Deepak
    Iyengar, Anuroop
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2388 - 2389