The IBM 2006 Speech Transcription System for European Parliamentary Speeches

被引：0

作者：

Ramabhadran, B. ^{[1
]}

Siohan, O. ^{[1
]}

Mangu, L. ^{[1
]}

Zweig, G. ^{[1
]}

Westphal, M. ^{[2
]}

Schulz, H. ^{[2
]}

Soneiro, A. ^{[2
]}

机构：

[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA

[2] IBM Germany, EMEA Voice Technol Dev, Munich, Germany

来源：

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年

关键词：

speech recognition; automatic segmentation; cross-adaptation; randomized decision trees; TC-STAR;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

TC-STAR is an European Union funded speech to speech translation project to transcribe, translate and synthesize European Parliamentary Plenary Speeches (EPPS). This paper describes IBM's English and Spanish speech recognition systems submitted to the TC-STAR 2006 Evaluation. The technical advances in this submission include two different algorithms for automatic segmentation and speaker clustering of the input audio; a system architecture that is based on cross-adaptation across these two segmentation schemes and system combination through generation of an ensemble of systems using randomized decision tree state-tying; automatic punctuation of the speech recognition output; and the incorporation of an additional 35 hours of in-domain EPPS acoustic training data. These advances reduced the error rate by 30% relative over the best-performing system in the TC-STAR 2005 Evaluation on the 2006 English development test set, and produced one of the best performing systems on the 2006 evaluation in English with a word error rate of 8.3%.

引用

页码：1225 / +

页数：2

共 50 条

[31] Developing high performance ASR in the IBM multilingual speech-to-speech translation system
Cui, Xiaodong
Gu, Liang
Xiang, Bing
Zhang, Wei
Gao, Yuqing
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5121 - 5124
[32] The IBM Speech Activity Detection System for the DARPA RATS Program
Saon, George
Thomas, Samuel
Soltau, Hagen
Ganapathy, Sriram
Kingsbury, Brian
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3464 - 3468
[33] ESTIMATION OF PROBABILITIES IN THE LANGUAGE MODEL OF THE IBM SPEECH RECOGNITION SYSTEM
NADAS, A
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (04): : 859 - 861
[34] The IBM 2015 English Conversational Telephone Speech Recognition System
Saon, George
Kuo, Hong-Kwang J.
Rennie, Steven
Picheny, Michael
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3140 - 3144
[35] Recent Advances of IBM's Handheld Speech Translation System
Zhu, Weizhong
Zhou, Bowen
Prosser, Charles
Krbec, Pavel
Gao, Yuqing
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1181 - 1184
[36] The IBM 2016 English Conversational Telephone Speech Recognition System
Saon, George
Sercu, Tom
Rennie, Steven
Kuo, Hong-Kwang J.
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 7 - 11
[37] Automatic transcription system for parliamentary debates in the context of assembly of the republic of Portugal
Nascimento, Pedro
Ferreira, João C.
Batista, Fernando
International Journal of Speech Technology, 2024, 27 (03) : 613 - 635
[38] The IRST English-Spanish Translation System for European Parliament Speeches
Falavigna, Daniele
Bertoldi, Nicola
Brugnara, Fabio
Cattoni, Roldano
Cettolo, Mauro
Chen, Boxing
Federico, Marcello
Giuliani, Diego
Gretter, Roberto
Gupta, Deepa
Seppi, Dino
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1137 - 1140
[39] The 2005 AMI system for the transcription of speech in meetings
Hain, T
Burget, L
Dines, J
Garau, G
Karafiat, M
Lincoln, M
McCowan, I
Moore, D
Wan, V
Ordelman, R
Renals, S
MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 450 - 462
[40] Advanced Rich Transcription System for Estonian Speech
Alumae, Tanel
Tilk, Ottokar
Asadullah
HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2018, 2018, 307 : 1 - 8

← 1 2 3 4 5 →