Cross-language Bootstrapping for Unsupervised Acoustic Model Training: Rapid Development of a Polish Speech Recognition System

被引：0

作者：

Loeoef, Jonas ^{[1
]}

Gollan, Christian ^{[1
]}

Ney, Hermann ^{[1
]}

机构：

[1] Rhein Westfal TH Aachen, Lehrstuhl Informat 6, Dept Comp Sci, Aachen, Germany

来源：

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | 2009年

关键词：

speech recognition; unsupervised training; cross-language bootstrapping;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes the rapid development of a Polish language speech recognition system. The system development was performed without access to any transcribed acoustic training data. This was achieved through the combined use of cross-language bootstrapping and confidence based unsupervised acoustic model training. A Spanish acoustic model was ported to Polish, through the use of a manually constructed phoneme mapping. This initial model was refined through iterative recognition and retraining of the untranscribed audio data. The system was trained and evaluated on recordings from the European Parliament, and included several state-of-the-art speech recognition techniques in addition to the use of unsupervised model training. Confidence based speaker adaptive training using features space transform adaptation, as well as vocal tract length normalization and maximum likelihood linear regression, was used to refine the acoustic model. Through the combination of the different techniques, good performance was achieved on the domain of parliamentary speeches.

引用

页码：96 / 99

页数：4

共 50 条

[31] Grammar based automatic speech recognition system for the Polish language
Korzinek, Danijel
Brocki, Lukasz
RECENT ADVANCES IN MECHATRONICS, 2007, : 87 - +
[32] SARMATA 2.0 Automatic Polish Language Speech Recognition System
Ziolko, Bartosz
Jadczyk, Tomasz
Skurzok, Dawid
Zelasko, Piotr
Galka, Jakub
Pedzimaz, Tomasz
Gawlik, Ireneusz
Palka, Szymon
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1062 - +
[33] Speech variability: A cross-language study on acoustic variations of speaking versus untrained singing
Hansen, John H. L.
Bokshi, Marigona
Khorram, Soheil
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 148 (02): : 829 - 844
[34] Privacy Preserving Acoustic Model Training for Speech Recognition
Tachioka, Yuuki
2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 627 - 631
[35] LANGUAGE MODEL BOOTSTRAPPING USING NEURAL MACHINE TRANSLATION FOR CONVERSATIONAL SPEECH RECOGNITION
Punjabi, Surabhi
Arsikere, Harish
Garimella, Sri
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 487 - 493
[36] A Cross-Language Study of Acoustic Predictors of Speech Intelligibility in Individuals With Parkinson's Disease
Kim, Yunjung
Choi, Yaelin
JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2017, 60 (09): : 2506 - 2518
[37] Speech recognition based on unified model of acoustic and language aspects of speech
1600, Nippon Telegraph and Telephone Corp. (11):
[38] Development of Hausa Acoustic Model for Speech Recognition
Ibrahim, Umar Adam
Boukar, Moussa Mahamat
Suleiman, Muhammad Aliyu
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (05) : 503 - 508
[39] Multi-model fusion framework based on multi-input cross-language emotional speech recognition
Hu, Guohua
Zhao, Qingshan
International Journal of Wireless and Mobile Computing, 2021, 20 (01): : 32 - 40
[40] A Comparison of Language Model Training Techniques in a Continuous Speech Recognition System for Serbian
Popovic, Branislav
Pakoci, Edvin
Pekar, Darko
SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 522 - 531

← 1 2 3 4 5 →