Automatic word spacing using probabilistic models based on character n-grams

被引：13

作者：

Lee, Do-Gil ^{[1
]}

Rim, Hae-Chang ^{[1
]}

Yook, Dongsuk ^{[1
]}

机构：

[1] Korea Univ, Dept Comp Sci & Engn, Seoul 136701, South Korea

来源：

IEEE INTELLIGENT SYSTEMS | 2007年 / 22卷 / 01期

关键词：

Probabilistic logics;

D O I：

10.1109/MIS.2007.4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Probabilistic models based on Hidden Markov models (HMM) for automatic word spacing that use characters n-grams, which is a sub-sequence of n characters in a given character sequence, are discussed. Automatic word spacing is a preprocessing techniques used for correcting boundaries between words in a sentence containing spacing errors. These model can be effectively applied to a natural language with a small character set, such as English, using character n-grams that are larger than trigrams. These models, which are language independent and can be effectively used for languages having word spacing, can also be used for word segmentation in the languages without explicit word spacing. These models, by generalizing the HMMs, can consider a broad context and estimate accurate probabilities.

引用

页码：28 / 35

页数：8

共 50 条

[31] Handwritten address recognition with open vocabulary using character n-grams
Brakensiek, A
Rottland, J
Rigoll, G
EIGHTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION: PROCEEDINGS, 2002, : 357 - 362
[32] Using character N-grams to explore diachronic change in medieval English
Buckley, Kevin
Vogel, Carl
FOLIA LINGUISTICA, 2019, 53 : 249 - 299
[33] Feature selection on Chinese text classification using character n-grams
Wei, Zhihua
Miao, Duoqian
Chauchat, Jean-Hugues
Zhong, Caiming
ROUGH SETS AND KNOWLEDGE TECHNOLOGY, 2008, 5009 : 500 - +
[34] Mining generalized character n-grams in large corpora
Marques, Nuno C.
Braud, Agnès
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2003, 2902 : 419 - 423
[35] ROBUST MODELING OF MUSICAL CHORD SEQUENCES USING PROBABILISTIC N-GRAMS
Scholz, Ricardo
Vincent, Emmanuel
Bimbot, Frederic
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 53 - 56
[36] Turkish Spelling Error Detection and Correction by Using Word N-grams
Dalkilic, Gokhan
Cebi, Yalcin
2009 FIFTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING, COMPUTING WITH WORDS AND PERCEPTIONS IN SYSTEM ANALYSIS, DECISION AND CONTROL, 2010, : 63 - 66
[37] Classifying True and False Hebrew Stories Using Word N-Grams
HaCohen-Kerner, Yaakov
Dilmon, Rakefet
Friedlich, Shimon
Cohen, Daniel Nissim
CYBERNETICS AND SYSTEMS, 2016, 47 (08) : 629 - 649
[38] Dissimilarities Detections in Texts Using Symbol n-grams and Word Histograms
Andrejkova, Gabriela
Almarimi, Abdulwahed
OPEN COMPUTER SCIENCE, 2016, 6 (01): : 168 - 177
[39] Character N-Grams for Detecting Deceptive Controversial Opinions
Sanchez-Junquera, Javier
Villasenor-Pineda, Luis
Montes-y-Gomez, Manuel
Rosso, Paolo
EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION (CLEF 2018), 2018, 11018 : 135 - 140
[40] Mining generalized character n-grams in large corpora
Marques, NC
Braud, A
PROGRESS IN ARTIFICIAL INTELLIGENCE-B, 2003, 2902 : 419 - 423

← 1 2 3 4 5 →