Effective database transformation and efficient support computation for mining sequential patterns

被引:0
|
作者
Cho, Chung-Wen [2 ]
Wu, Yi-Hung [3 ]
Chen, Arbee L. P. [1 ]
机构
[1] Natl Chengchi Univ, Dept Comp Sci, Taipei, Taiwan
[2] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu 30043, Taiwan
[3] Chung Yuan Christian Univ, Dept Informat & Comp Engn, Jhongli, Taiwan
关键词
Data mining; Sequential patterns; Database transformation; Support computation; Database projection; ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel algorithm for mining frequent sequences from transaction databases. The transactions of the same customers form a set of customer sequences. A sequence (an ordered list of itemsets) is frequent if the number of customer sequences containing it satisfies the user-specified threshold. The 1-sequence is a special type of sequences because it consists of only a single itemset instead of an ordered list, while the k-sequence is a sequence composed of k itemsets. Compared with the cost of mining frequent k-sequences (k a parts per thousand yenaEuro parts per thousand 2), the cost of mining frequent 1-sequences is negligible. We adopt a two-phase architecture to find the two types of frequent sequences separately in order that the discovery of frequent k-sequences can be well designed and optimized. For efficient frequent k-sequence mining, every frequent 1-sequence is encoded as a unique symbol and the database is transformed into one constituted by the symbols. We find that it is unnecessary to encode all the frequent 1-seqences, and make full use of the discovered frequent 1-sequences to transform the database into one with a smaller size. For every k a parts per thousand yenaEuro parts per thousand 2, the customer sequences in the transformed database are scanned to find all the frequent k-sequences. We devise the compact representation for a customer sequence and elaborate the method to enumerate all distinct subsequences from a customer sequence without redundant scans. The soundness of the proposed approach is verified and a number of experiments are performed. The results show that our approach outperforms the previous works in both scalability and execution time.
引用
收藏
页码:23 / 51
页数:29
相关论文
共 50 条
  • [1] Effective database transformation and efficient support computation for mining sequential patterns
    Cho, CW
    Wu, YH
    Chen, ALP
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2005, 3453 : 163 - 174
  • [2] Effective database transformation and efficient support computation for mining sequential patterns
    Chung-Wen Cho
    Yi-Hung Wu
    Arbee L. P. Chen
    Journal of Intelligent Information Systems, 2009, 32 (1) : 23 - 51
  • [3] Database support for data mining patterns
    Kotsifakos, E
    Ntoutsi, I
    Theodoridis, Y
    ADVANCES IN INFORMATICS, PROCEEDINGS, 2005, 3746 : 14 - 24
  • [4] An Effective Approach for Mining Weighted Sequential Patterns
    Patel, Mukesh
    Modi, Nilesh
    Passi, Kalpdrum
    SMART TRENDS IN INFORMATION TECHNOLOGY AND COMPUTER COMMUNICATIONS, SMARTCOM 2016, 2016, 628 : 904 - 915
  • [5] Approximate sequential patterns for incomplete sequence database mining
    Fiot, Celine
    Laurent, Anne
    Teisseire, Maguelonne
    2007 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-4, 2007, : 663 - 668
  • [6] An efficient algorithm for incremental mining of sequential patterns
    Ren, Jia-Dong
    Zhou, Xiao-Lei
    ADVANCES IN MACHINE LEARNING AND CYBERNETICS, 2006, 3930 : 179 - 188
  • [7] A high efficient algorithm of mining sequential patterns
    Qin, F
    Yang, XB
    PROCEEDINGS OF THE 3RD WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-5, 2000, : 3750 - 3752
  • [8] Efficient Mining of Outlying Sequential Behavior Patterns
    Xu, Yifan
    Duan, Lei
    Xie, Guicai
    Fu, Min
    Li, Longhai
    Nummenmaa, Jyrki
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 325 - 341
  • [9] An efficient method for mining sequential patterns with indices
    Huynh, Huy Minh
    Nguyen, Loan T. T.
    Pham, Nam Ngoc
    Oplatkova, Zuzana Kominkova
    Yun, Unil
    Vo, Bay
    KNOWLEDGE-BASED SYSTEMS, 2022, 239
  • [10] NSPIS: Mining Negative Sequential Patterns with Individual Support
    Huang, Gengsen
    Gan, Wensheng
    Huang, Shan
    Chen, Jiahui
    Chen, Chien-Ming
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5507 - 5516