LANGUAGE MODEL ADAPTATION USING WWW DOCUMENTS OBTAINED BY UTTERANCE-BASED QUERIES

被引:3
|
作者
Tsiartas, Andreas [1 ]
Georgiou, Panayiotis [1 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ So Calif, Dept Elect Engn, Speech Anal & Interpretat Lab, Los Angeles, CA 90089 USA
关键词
Adapt language models; utterance queries; WWW corpora; in-domain documents;
D O I
10.1109/ICASSP.2010.5494928
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we consider the estimation of topic specific Language Models (LM) by exploiting documents from the World Wide Web (WWW). We focus on the quality of the generated queries and propose a novel query generation method. In contrast to the n-gram based queries used in past works, our approach relies on utterances as queries candidates. The proposed approach does not rely on any language specific information other than the initial in-domain training text. We have conducted experiments with Web texts of size 0-150 million words, and we have shown that despite not using any language specific information, the proposed approach results in up to 1.1% absolute Word Error Rate (WER) improvement as compared to keyword-based approaches. The proposed approach reduces the WER by 6.3% absolute in our experiments, compared to an in-domain LM without considering any Web data.
引用
收藏
页码:5406 / 5409
页数:4
相关论文
共 50 条
  • [31] Language model adaptation in speech recognition using document maps
    Lagus, K
    Kurimo, M
    NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS, 2002, : 627 - 636
  • [32] Unsupervised Language Model Adaptation Using Latent Semantic Marginals
    Tam, Yik-Cheung
    Schultz, Tanja
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2206 - 2209
  • [33] Language model adaptation using mixtures and an exponentially decaying cache
    Clarkson, PR
    Robinson, AJ
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 799 - 802
  • [34] Dynamic Language Model Adaptation Using Keyword Category Classification
    Yamamoto, Hitoshi
    Hanazawa, Ken
    Miki, Kiyokazu
    Shinoda, Koichi
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2426 - +
  • [35] Splitting input for machine translation using N-gram language model together with utterance similarity
    Doi, T
    Sumita, E
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (06): : 1256 - 1264
  • [36] ASR Adaptation for E-commerce Chatbots using Cross-Utterance Context and Multi-Task Language Modeling
    Shenoy, Ashish
    Bodapati, Sravan
    Kirchhoff, Katrin
    ECNLP 4: THE FOURTH WORKSHOP ON E-COMMERCE AND NLP, 2021, : 18 - 25
  • [37] TASK independent utterance verification using garbage model based on hierarchical phoneme
    Liu, J
    Zhong, L
    Liu, J
    Liu, RS
    CHINESE JOURNAL OF ELECTRONICS, 2001, 10 (04): : 465 - 470
  • [38] Maximum entropy based generic filter for language model adaptation
    Yu, D
    Mahajan, M
    Mau, P
    Acero, A
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 597 - 600
  • [39] Instance-Based On-line Language Model Adaptation
    Bayer, Ali Orkan
    Riccardi, Giuseppe
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2687 - 2691
  • [40] PHRASE-BASED DATA SELECTION FOR LANGUAGE MODEL ADAPTATION IN SPOKEN LANGUAGE TRANSLATION
    Lu, Shixiang
    Wei, Wei
    Fu, Xiaoyin
    Fan, Lichun
    Xu, Bo
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 193 - 196