The IBM BOLT Speech Transcription System

被引:0
|
作者
Thomas, Samuel [1 ]
Saon, George [1 ]
Kuo, Hong-Kwang [1 ]
Mangu, Lidia [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
Automatic speech recognition; conversational telephone speech; deep neural networks; machine translation;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We describe the IBM automatic speech recognition (ASR) system for the DARPA Broad Operational Language Translation (BOLT) program. The system is used to transcribe conversational telephone speech (CTS) prior to machine translation for Phase 3 of the program's Activity A. The ASR system is a combination of novel sequence trained ensemble deep neural network acoustic models on speaker adapted features and convolutional neural network models on two kinds of spectro-temporal representations of speech, in conjunction with a variety of class, neural network and n-gram based language models. Acoustic and language models for the recognition system are built on transcribed audio released under the program and further optimized for the final machine translation task as well. The evaluation system has a word error rate of 32.7% on a 2 hour Egyptian Arabic development set for this task.
引用
收藏
页码:3150 / 3153
页数:4
相关论文
共 50 条
  • [31] The development of the AMI system for the transcription of speech in meetings
    Hain, T
    Burget, L
    Dines, J
    McCowan, I
    Garau, G
    Karafiat, M
    Lincoln, M
    Moore, D
    Wan, V
    Ordelman, R
    Renals, S
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 344 - 356
  • [32] The IBM speech-to-speech translation system for smartphone: Improvements for resource-constrained tasks
    Zhou, Bowen
    Cui, Xiaodong
    Huang, Songfang
    Cmejrek, Martin
    Zhang, Wei
    Xue, Jian
    Cui, Jia
    Xiang, Bing
    Daggett, Gregg
    Chaudhari, Upendra
    Maskey, Sameer
    Marcheret, Etienne
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (02): : 592 - 618
  • [33] A SPEECH WAVE-FORM INPUT AND DISPLAY SYSTEM FOR THE IBM PC
    TYLER, JEM
    JOURNAL OF MICROCOMPUTER APPLICATIONS, 1987, 10 (03): : 219 - 227
  • [34] IMPROVEMENTS TO THE IBM SPEECH ACTIVITY DETECTION SYSTEM FOR THE DARPA RATS PROGRAM
    Thomas, Samuel
    Saon, George
    Van Segbroeck, Maarten
    Narayanan, Shrikanth S.
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4500 - 4504
  • [35] The IBM expressive text-to-speech synthesis system for American English
    Pitrelli, John F.
    Bakis, Raitno
    Eide, Ellen M.
    Fernandez, Raul
    Hamza, Wael
    Picheny, Michael A.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04): : 1099 - 1108
  • [36] The IBM Personal Speech Assistant
    Comerford, L
    Frank, D
    Gopalakrishnan, P
    Gopinath, R
    Sedivy, J
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 1 - 4
  • [37] 1998 HTK system for transcription of conversational telephone speech
    Hain, T.
    Woodland, P.C.
    Niesler, T.R.
    Whittaker, E.W.D.
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 1 : 57 - 60
  • [38] Transcription System for Semi-Spontaneous Estonian Speech
    Alumaee, Tanel
    HUMAN LANGUAGE TECHNOLOGIES: THE BALTIC PERSPECTIVE, 2012, 247 : 10 - 17
  • [39] Intelligent transcription system based on spontaneous speech processing
    Kawahara, Tatsuya
    ICKS 2007: SECOND INTERNATIONAL CONFERENCE ON INFORMATICS RESEARCH FOR DEVELOPMENT OF KNOWLEDGE SOCIETY INFRASTRUCTURE, PROCEEDINGS, 2007, : 19 - 26
  • [40] Slovak Broadcast News Speech Recognition and Transcription System
    Lojka, Martin
    Viszlay, Peter
    Stas, Jan
    Hladek, Daniel
    Juhar, Jozef
    ADVANCES IN NETWORK-BASED INFORMATION SYSTEMS, NBIS-2018, 2019, 22 : 385 - 394