LANGUAGE MODEL ADAPTATION USING WWW DOCUMENTS OBTAINED BY UTTERANCE-BASED QUERIES

被引:3
|
作者
Tsiartas, Andreas [1 ]
Georgiou, Panayiotis [1 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ So Calif, Dept Elect Engn, Speech Anal & Interpretat Lab, Los Angeles, CA 90089 USA
关键词
Adapt language models; utterance queries; WWW corpora; in-domain documents;
D O I
10.1109/ICASSP.2010.5494928
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we consider the estimation of topic specific Language Models (LM) by exploiting documents from the World Wide Web (WWW). We focus on the quality of the generated queries and propose a novel query generation method. In contrast to the n-gram based queries used in past works, our approach relies on utterances as queries candidates. The proposed approach does not rely on any language specific information other than the initial in-domain training text. We have conducted experiments with Web texts of size 0-150 million words, and we have shown that despite not using any language specific information, the proposed approach results in up to 1.1% absolute Word Error Rate (WER) improvement as compared to keyword-based approaches. The proposed approach reduces the WER by 6.3% absolute in our experiments, compared to an in-domain LM without considering any Web data.
引用
收藏
页码:5406 / 5409
页数:4
相关论文
共 50 条
  • [21] Online LDA-Based Language Model Adaptation
    Lehecka, Jan
    Prazak, Ales
    TEXT, SPEECH, AND DIALOGUE (TSD 2018), 2018, 11107 : 334 - 341
  • [22] Phoneme based Domain Prediction for Language Model Adaptation
    Bhasin, Anmol
    Mathur, Gaurav
    Yenigalla, Promod
    Natarajan, Bharatram
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [23] Unsupervised Cross-Adaptation Using Language Model and Deep Learning Based Acoustic Model Adaptations
    Takagi, Akira
    Konno, Kazuki
    Kato, Masaharu
    Kosaka, Tetsuo
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [24] Machine Learning-Based Automatic Utterance Collection Model for Language Development Screening of Children
    Choi, Jeong-Myeong
    Lee, Yoon-Kyoung
    Kim, Jong-Dae
    Park, Chan-Young
    Kim, Yu-Seop
    APPLIED SCIENCES-BASEL, 2022, 12 (09):
  • [25] Translating Natural Language Queries to SQL Using the T5 Model
    Wong, Albert
    Pham, Lien
    Lee, Young
    Chan, Shek
    Sadaya, Razel
    Khmelevsky, Youry
    Clement, Mathias
    Cheng, Florence Wing Yau
    Mahony, Joe
    Ferri, Michael
    18TH ANNUAL IEEE INTERNATIONAL SYSTEMS CONFERENCE, SYSCON 2024, 2024,
  • [26] A New Language Model Adaptation Framework Using Modification of Structures of Background Corpus and Language Model
    Lv, Zhenyu
    Liu, Wenju
    Yang, Zhanlei
    IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 493 - 496
  • [27] Documents ranking based on a hybrid language model for Chinese information retrieval
    Zheng, Dequan
    Yu, Feng
    Zhao, Tiejun
    Li, Sheng
    2006 IEEE INTERNATIONAL CONFERENCE ON INFORMATION ACQUISITION, VOLS 1 AND 2, CONFERENCE PROCEEDINGS, 2006, : 279 - 283
  • [28] Language model adaptation based on the classification of a trigram's language style feature
    Liang, Q
    Zheng, TF
    Xu, MX
    Wu, WH
    Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 91 - 96
  • [29] Cross Language Information Retrieval Model For Discovering WSDL Documents Using Arabic Language Query
    Sultan, Torkey I.
    Khedr, Ayman E.
    Alsheref, Fahad Kamal
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2013, 4 (08) : 118 - 129
  • [30] VOICE SEARCH LANGUAGE MODEL ADAPTATION USING CONTEXTUAL INFORMATION
    Scheiner, Justin
    Williams, Ian
    Aleksic, Petar
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 253 - 257