TRANSCRIPTION OF MULTI-GENRE MEDIA ARCHIVES USING OUT-OF-DOMAIN DATA

被引:0
|
作者
Bell, P. J. [1 ]
Gales, M. J. F.
Lanchantin, P.
Liu, X.
Long, Y.
Renals, S. [1 ]
Swietojanski, P. [1 ]
Woodland, P. C.
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
speech recognition; tandem; cross-domain adaptation; media archives;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We describe our work on developing a speech recognition system for multi-genre media archives. The high diversity of the data makes this a challenging recognition task, which may benefit from systems trained on a combination of in-domain and out-of-domain data. Working with tandem HMMs, we present Multi-level Adaptive Networks (MLAN), a novel technique for incorporating information from out-of-domain posterior features using deep neural networks. We show that it provides a substantial reduction in WER over other systems, with relative WER reductions of 15% over a PLP baseline, 9% over in-domain tandem features and 8% over the best out-of-domain tandem features.
引用
收藏
页码:324 / 329
页数:6
相关论文
共 50 条
  • [31] Out-of-domain utterance detection using classification confidences of multiple topics
    Lane, Ian
    Kawahara, Tatsuya
    Matsui, Tomoko
    Nakamura, Satoshi
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01): : 150 - 161
  • [32] IN-DOMAIN AND OUT-OF-DOMAIN DATA AUGMENTATION TO IMPROVE CHILDREN'S SPEAKER VERIFICATION SYSTEM IN LIMITED DATA SCENARIO
    Shahnawazuddin, S.
    Ahmad, Waquar
    Adiga, Nagaraj
    Kumar, Avinash
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7554 - 7558
  • [33] NCSTRL+: Adding multi-discipline and multi-genre support to the Dienst protocol using clusters and buckets
    Nelson, ML
    Maly, K
    Shen, SNT
    Zubair, M
    IEEE INTERNATIONAL FORUM ON RESEARCH AND TECHNOLOGY ADVANCES IN DIGITAL LIBRARIES -ADL'98-, PROCEEDINGS, 1998, : 128 - 136
  • [34] Automatic Construction of a Large-Scale Speech Recognition Database Using Multi-Genre Broadcast Data with Inaccurate Subtitle Timestamps
    Bang, Jeong-Uk
    Choi, Mu-Yeol
    Kim, Sang-Hun
    Kwon, Oh-Wook
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (02) : 406 - 415
  • [35] IMPROVING CONFIDENCE ESTIMATION ON OUT-OF-DOMAIN DATA FOR END-TO-END SPEECH RECOGNITION
    Li, Qiujia
    Zhang, Yu
    Qiu, David
    He, Yanzhang
    Cao, Liangliang
    Woodland, Philip C.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6537 - 6541
  • [36] Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions
    Kocielnik, Rafal
    Kangaslahti, Sara
    Prabhumoye, Shrimai
    Hari, Meena
    Alvarez, R. Michael
    Anandkumar, Anima
    TRANSFER LEARNING FOR NATURAL LANGUAGE PROCESSING WORKSHOP, VOL 203, 2022, 203 : 22 - 32
  • [37] An adapted data selection for deep learning-based audio segmentation in multi-genre broadcast channel
    Yang, Xu-Kui
    Qu, Dan
    Zhang, Wen-Lin
    Zhang, Wei-Qiang
    DIGITAL SIGNAL PROCESSING, 2018, 81 : 8 - 15
  • [38] Optimal transport-based transfer learning for smart manufacturing: Tool wear prediction using out-of-domain data
    Xie, Rui
    Wu, Dazhong
    MANUFACTURING LETTERS, 2021, 29 (29) : 104 - 107
  • [39] Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification
    Alekseev, Anton
    Tutubalina, Elena
    Malykh, Valentin
    Nikolenko, Sergey
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2487 - 2496
  • [40] SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness
    Ng, Nathan
    Cho, Kyunghyun
    Ghassemi, Marzyeh
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1268 - 1283