Training a language model using webdata for large vocabulary Japanese spontaneous speech recognition

被引:0
|
作者
Masumura, Ryo [1 ]
Hahm, Seongjun [1 ]
Ito, Akinori [1 ]
机构
[1] Tohoku Univ, Grad Sch Engn, Sendai, Miyagi 980, Japan
关键词
Spontaneous speech recognition; language model; World Wide Web; large vocabulary continuous speech recognition; Corpus of Spontaneous Japanese;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a language modeling method using large-scale spoken language data retrieved from the Web for spontaneous speech recognition. We downloaded 15 million Web pages on a comprehensive range topics. Next, spoken language-like texts were selected from the downloaded Web data using the naive Bayes classifier, and typical linguistic phenomena such as fillers and pauses were added using simulation models. A language model trained by the generated data gave as high performance as the large-scale spontaneous speech corpus (Corpus of Spontaneous Japanese, CSJ). By combining the generated data and CSJ, we improved word accuracy.
引用
收藏
页码:1476 / 1479
页数:4
相关论文
共 50 条
  • [41] LARGE VOCABULARY HIDDEN MARKOV MODEL BASED SPEECH RECOGNITION
    RIGOLL, G
    EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, 1990, 1 (01): : 37 - 42
  • [42] Large Vocabulary Continuous Speech Recognition using Associative Memory and Hidden Markov Model
    Kayikci, Zoehre Kara
    Palm, Guenter
    PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON SIGNAL, SPEECH AND IMAGE PROCESSING (SSIP '08), 2008, : 61 - 66
  • [43] Large vocabulary speech recognition in French
    Adda-Decker, M
    Adda, G
    Gauvain, JL
    Lamel, L
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 45 - 48
  • [44] Advances in Large Vocabulary Speech Recognition
    Gauvain, JL
    De Mori, R
    Lamel, L
    COMPUTER SPEECH AND LANGUAGE, 2002, 16 (01): : 1 - 3
  • [45] Large vocabulary speech recognition in French
    Adda-Decker, Martine
    Adda, Gilles
    Gauvain, Jean-Luc
    Lamel, Lori
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 1 : 45 - 48
  • [46] Japanese large-vocabulary continuous-speech recognition using a business-newspaper corpus
    Matsuoka, T
    Ohtsuki, K
    Mori, T
    Yoshida, K
    Furui, S
    Shirai, K
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS, 1997, : 1803 - 1806
  • [47] Japanese large-vocabulary continuous-speech recognition using a newspaper corpus and broadcast news
    NTT Human Interface Laboratories, Speech Acoust. Lab., R., Kanagawa, Japan
    不详
    不详
    Speech Commun, 2 (155-166):
  • [48] Alzheimer's disease recognition from spontaneous speech using large language models
    Bang, Jeong-Uk
    Han, Seung-Hoon
    Kang, Byung-Ok
    ETRI JOURNAL, 2024, 46 (01) : 96 - 105
  • [49] Japanese large-vocabulary continuous-speech recognition using a newspaper corpus and broadcast news
    Ohtsuki, K
    Matsuoka, T
    Mori, T
    Yoshida, K
    Taguchi, Y
    Furui, S
    Shirai, K
    SPEECH COMMUNICATION, 1999, 28 (02) : 155 - 166
  • [50] Japanese large-vocabulary continuous-speech recognition using a business-newspaper corpus
    Matsuoka, T
    Ohtsuki, K
    Mori, T
    Furui, S
    Shirai, K
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 22 - 25