TRANSCRIPTION OF MULTI-GENRE MEDIA ARCHIVES USING OUT-OF-DOMAIN DATA

被引：0

作者：

Bell, P. J. ^{[1
]}

Gales, M. J. F.

Lanchantin, P.

Liu, X.

Long, Y.

Renals, S. ^{[1
]}

Swietojanski, P. ^{[1
]}

Woodland, P. C.

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland

来源：

2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012) | 2012年

基金：

英国工程与自然科学研究理事会;

关键词：

speech recognition; tandem; cross-domain adaptation; media archives;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We describe our work on developing a speech recognition system for multi-genre media archives. The high diversity of the data makes this a challenging recognition task, which may benefit from systems trained on a combination of in-domain and out-of-domain data. Working with tandem HMMs, we present Multi-level Adaptive Networks (MLAN), a novel technique for incorporating information from out-of-domain posterior features using deep neural networks. We show that it provides a substantial reduction in WER over other systems, with relative WER reductions of 15% over a PLP baseline, 9% over in-domain tandem features and 8% over the best out-of-domain tandem features.

引用

页码：324 / 329

页数：6

共 50 条

[31] Out-of-domain utterance detection using classification confidences of multiple topics
Lane, Ian
Kawahara, Tatsuya
Matsui, Tomoko
Nakamura, Satoshi
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01): : 150 - 161
[32] IN-DOMAIN AND OUT-OF-DOMAIN DATA AUGMENTATION TO IMPROVE CHILDREN'S SPEAKER VERIFICATION SYSTEM IN LIMITED DATA SCENARIO
Shahnawazuddin, S.
Ahmad, Waquar
Adiga, Nagaraj
Kumar, Avinash
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7554 - 7558
[33] NCSTRL+: Adding multi-discipline and multi-genre support to the Dienst protocol using clusters and buckets
Nelson, ML
Maly, K
Shen, SNT
Zubair, M
IEEE INTERNATIONAL FORUM ON RESEARCH AND TECHNOLOGY ADVANCES IN DIGITAL LIBRARIES -ADL'98-, PROCEEDINGS, 1998, : 128 - 136
[34] Automatic Construction of a Large-Scale Speech Recognition Database Using Multi-Genre Broadcast Data with Inaccurate Subtitle Timestamps
Bang, Jeong-Uk
Choi, Mu-Yeol
Kim, Sang-Hun
Kwon, Oh-Wook
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (02) : 406 - 415
[35] IMPROVING CONFIDENCE ESTIMATION ON OUT-OF-DOMAIN DATA FOR END-TO-END SPEECH RECOGNITION
Li, Qiujia
Zhang, Yu
Qiu, David
He, Yanzhang
Cao, Liangliang
Woodland, Philip C.
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6537 - 6541
[36] Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions
Kocielnik, Rafal
Kangaslahti, Sara
Prabhumoye, Shrimai
Hari, Meena
Alvarez, R. Michael
Anandkumar, Anima
TRANSFER LEARNING FOR NATURAL LANGUAGE PROCESSING WORKSHOP, VOL 203, 2022, 203 : 22 - 32
[37] An adapted data selection for deep learning-based audio segmentation in multi-genre broadcast channel
Yang, Xu-Kui
Qu, Dan
Zhang, Wen-Lin
Zhang, Wei-Qiang
DIGITAL SIGNAL PROCESSING, 2018, 81 : 8 - 15
[38] Optimal transport-based transfer learning for smart manufacturing: Tool wear prediction using out-of-domain data
Xie, Rui
Wu, Dazhong
MANUFACTURING LETTERS, 2021, 29 (29) : 104 - 107
[39] Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification
Alekseev, Anton
Tutubalina, Elena
Malykh, Valentin
Nikolenko, Sergey
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2487 - 2496
[40] SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness
Ng, Nathan
Cho, Kyunghyun
Ghassemi, Marzyeh
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1268 - 1283

← 1 2 3 4 5 →