Study for the Double-array Trie Tree Based Algorithm in Word Segmentation

被引:0
|
作者
Yang, Wenchuan [1 ]
Fang, Zeyang [1 ]
Li, Pengfei [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing 100876, Peoples R China
关键词
double-array; trie tree; time complexity; word segmentation dictionary;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents an improved algorithm-iDAT, which is based on Double-Array Trie Tree for Chinese Word Segmentation Dictionary. After initialization the original dictionary. Chinese word segmentation dictionary based on the Double-Array Trie Tree has higher efficiency of search, but the dynamic insertion will consume a lot of time. We implement a Hash process to the empty sequence index values for base array. The final Hash table stores the sum of the empty sequence before the current empty sequence. This algorithm adopt Sunday jumps algorithm of Single Pattern Matching. With slightly and reasonable space cost increasing, iDAT reduces the average time complexity of the dynamic insertion process in Trie Tree. Practical results shows it has a good operation performance.
引用
收藏
页码:440 / 446
页数:7
相关论文
共 50 条
  • [31] Automatic word segmentation for Chinese classics of tea based on tree-pruning
    Fang, Miao
    Jiang, Yi
    Zhao, Qi
    Jiang, Xin
    2009 SECOND INTERNATIONAL SYMPOSIUM ON KNOWLEDGE ACQUISITION AND MODELING: KAM 2009, VOL 1, 2009, : 438 - +
  • [32] A multiscale image segmentation algorithm based on binary partition tree
    Liu, Zhi
    Shen, Liquan
    Zhang, Zhaoyang
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2009, 21 (09): : 1321 - 1327
  • [33] Using directed graph based BDMM algorithm for Chinese word segmentation
    Chen, YD
    Wang, T
    Chen, HW
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 214 - 217
  • [34] Research on Improved Algorithm for Chinese Word Segmentation Based on Markov Chain
    Pang Baomao
    Shi Haoshan
    FIFTH INTERNATIONAL CONFERENCE ON INFORMATION ASSURANCE AND SECURITY, VOL 1, PROCEEDINGS, 2009, : 236 - 238
  • [35] A Study of Chinese Word Segmentation Based on the Characteristics of Chinese
    Han, Aaron Li-Feng
    Wong, Derek F.
    Chao, Lidia S.
    He, Liangye
    Zhu, Ling
    Li, Shuo
    LANGUAGE PROCESSING AND KNOWLEDGE IN THE WEB, 2013, 8105 : 111 - 118
  • [36] Double array structures based on byte segmentation for n-gram
    Fuketa, Masao
    Morita, Kazuhiro
    Aoe, Jun-Ichi
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2015, 52 (2-3) : 110 - 116
  • [37] A fast directed tree based neighborhood clustering algorithm for image segmentation
    Ding, Jundi
    Chen, SongCan
    Ma, RuNing
    Wang, Bo
    NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 369 - 378
  • [38] A Probability Model Chinese Word Segmentation Algorithm Based on Aho-Corasick Automata Algorithm
    Xu Y.-B.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2017, 46 (02): : 426 - 433
  • [39] A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation
    Lavner, Yizhar
    Ruinskiy, Dima
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2009,
  • [40] Maximum Inter Class Variance Segmentation Algorithm Based on Decision Tree
    Yi, Sanli
    Zhang, Guifang
    He, Jianfeng
    INTERNATIONAL JOURNAL OF INFORMATION SYSTEMS IN THE SERVICE SECTOR, 2019, 11 (02) : 72 - 87