Training a language model using webdata for large vocabulary Japanese spontaneous speech recognition

被引：0

作者：

Masumura, Ryo ^{[1
]}

Hahm, Seongjun ^{[1
]}

Ito, Akinori ^{[1
]}

机构：

[1] Tohoku Univ, Grad Sch Engn, Sendai, Miyagi 980, Japan

来源：

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年

关键词：

Spontaneous speech recognition; language model; World Wide Web; large vocabulary continuous speech recognition; Corpus of Spontaneous Japanese;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes a language modeling method using large-scale spoken language data retrieved from the Web for spontaneous speech recognition. We downloaded 15 million Web pages on a comprehensive range topics. Next, spoken language-like texts were selected from the downloaded Web data using the naive Bayes classifier, and typical linguistic phenomena such as fillers and pauses were added using simulation models. A language model trained by the generated data gave as high performance as the large-scale spontaneous speech corpus (Corpus of Spontaneous Japanese, CSJ). By combining the generated data and CSJ, we improved word accuracy.

引用

页码：1476 / 1479

页数：4

共 50 条

[41] LARGE VOCABULARY HIDDEN MARKOV MODEL BASED SPEECH RECOGNITION
RIGOLL, G
EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, 1990, 1 (01): : 37 - 42
[42] Large Vocabulary Continuous Speech Recognition using Associative Memory and Hidden Markov Model
Kayikci, Zoehre Kara
Palm, Guenter
PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON SIGNAL, SPEECH AND IMAGE PROCESSING (SSIP '08), 2008, : 61 - 66
[43] Large vocabulary speech recognition in French
Adda-Decker, M
Adda, G
Gauvain, JL
Lamel, L
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 45 - 48
[44] Advances in Large Vocabulary Speech Recognition
Gauvain, JL
De Mori, R
Lamel, L
COMPUTER SPEECH AND LANGUAGE, 2002, 16 (01): : 1 - 3
[45] Large vocabulary speech recognition in French
Adda-Decker, Martine
Adda, Gilles
Gauvain, Jean-Luc
Lamel, Lori
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 1 : 45 - 48
[46] Japanese large-vocabulary continuous-speech recognition using a business-newspaper corpus
Matsuoka, T
Ohtsuki, K
Mori, T
Yoshida, K
Furui, S
Shirai, K
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS, 1997, : 1803 - 1806
[47] Japanese large-vocabulary continuous-speech recognition using a newspaper corpus and broadcast news
NTT Human Interface Laboratories, Speech Acoust. Lab., R., Kanagawa, Japan
不详
不详
Speech Commun, 2 (155-166):
[48] Alzheimer's disease recognition from spontaneous speech using large language models
Bang, Jeong-Uk
Han, Seung-Hoon
Kang, Byung-Ok
ETRI JOURNAL, 2024, 46 (01) : 96 - 105
[49] Japanese large-vocabulary continuous-speech recognition using a newspaper corpus and broadcast news
Ohtsuki, K
Matsuoka, T
Mori, T
Yoshida, K
Taguchi, Y
Furui, S
Shirai, K
SPEECH COMMUNICATION, 1999, 28 (02) : 155 - 166
[50] Japanese large-vocabulary continuous-speech recognition using a business-newspaper corpus
Matsuoka, T
Ohtsuki, K
Mori, T
Furui, S
Shirai, K
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 22 - 25

← 1 2 3 4 5 →