LANGUAGE MODEL ADAPTATION USING WWW DOCUMENTS OBTAINED BY UTTERANCE-BASED QUERIES

被引:3
|
作者
Tsiartas, Andreas [1 ]
Georgiou, Panayiotis [1 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ So Calif, Dept Elect Engn, Speech Anal & Interpretat Lab, Los Angeles, CA 90089 USA
关键词
Adapt language models; utterance queries; WWW corpora; in-domain documents;
D O I
10.1109/ICASSP.2010.5494928
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we consider the estimation of topic specific Language Models (LM) by exploiting documents from the World Wide Web (WWW). We focus on the quality of the generated queries and propose a novel query generation method. In contrast to the n-gram based queries used in past works, our approach relies on utterances as queries candidates. The proposed approach does not rely on any language specific information other than the initial in-domain training text. We have conducted experiments with Web texts of size 0-150 million words, and we have shown that despite not using any language specific information, the proposed approach results in up to 1.1% absolute Word Error Rate (WER) improvement as compared to keyword-based approaches. The proposed approach reduces the WER by 6.3% absolute in our experiments, compared to an in-domain LM without considering any Web data.
引用
收藏
页码:5406 / 5409
页数:4
相关论文
共 50 条
  • [41] A Language Model for Improving the Graph-Based Transcription Approach for Historical Documents
    Lecireth Meza-Lovon, Graciela
    ADVANCES IN ARTIFICIAL INTELLIGENCE (IBERAMIA 2014), 2014, 8864 : 229 - 241
  • [42] Unsupervised language model adaptation using LDA-based mixture models and latent semantic marginals
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    COMPUTER SPEECH AND LANGUAGE, 2015, 29 (01): : 20 - 31
  • [43] Improving Accented Mandarin Speech Recognition by Using Recurrent Neural Network based Language Model Adaptation
    Ni, Hao
    Yi, Jiangyan
    Wen, Zhengqi
    Liu, Bin
    Tao, Jianhua
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [44] Ontology-Based Understanding of Natural Language Queries Using Nested Conceptual Graphs
    Cao, Tru H.
    Mai, Anh H.
    CONCEPTUAL STRUCTURES: FROM INFORMATION TO INTELLIGENCE, 2010, 6208 : 70 - 83
  • [45] Using a patterns-based modelling language and a model-based adaptation architecture to facilitate adaptive user interfaces
    Nilsson, Erik G.
    Floch, Jacqueline
    Hallsteinsen, Svein
    Stav, Erlend
    INTERACTIVE SYSTEMS: DESIGN, SPECIFICATION, AND VERIFICATION, 2007, 4323 : 234 - +
  • [46] Data augmentation and language model adaptation using singular value decomposition
    Béchet, F.
    De Mori, R.
    Janiszek, D.
    1600, Elsevier (25):
  • [47] Unsupervised adaptation of a stochastic Language Model using a Japanese raw corpus
    Kurata, Gakuto
    Mori, Shinsuke
    Nishimura, Masafumi
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1037 - 1040
  • [48] Improved language model adaptation using existing and derived external resources
    Chang, PC
    Lee, LS
    ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 531 - 536
  • [49] Data augmentation and language model adaptation using singular value decomposition
    Béchet, F
    De Mori, R
    Janiszek, D
    PATTERN RECOGNITION LETTERS, 2004, 25 (01) : 15 - 19
  • [50] LANGUAGE MODEL COMBINATION AND ADAPTATION USING WEIGHTED FINITE STATE TRANSDUCERS
    Liu, X.
    Gales, M. J. F.
    Hieronymus, J. L.
    Woodland, P. C.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5390 - 5393