Effective database transformation and efficient support computation for mining sequential patterns

被引:0
|
作者
Cho, Chung-Wen [2 ]
Wu, Yi-Hung [3 ]
Chen, Arbee L. P. [1 ]
机构
[1] Natl Chengchi Univ, Dept Comp Sci, Taipei, Taiwan
[2] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu 30043, Taiwan
[3] Chung Yuan Christian Univ, Dept Informat & Comp Engn, Jhongli, Taiwan
关键词
Data mining; Sequential patterns; Database transformation; Support computation; Database projection; ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel algorithm for mining frequent sequences from transaction databases. The transactions of the same customers form a set of customer sequences. A sequence (an ordered list of itemsets) is frequent if the number of customer sequences containing it satisfies the user-specified threshold. The 1-sequence is a special type of sequences because it consists of only a single itemset instead of an ordered list, while the k-sequence is a sequence composed of k itemsets. Compared with the cost of mining frequent k-sequences (k a parts per thousand yenaEuro parts per thousand 2), the cost of mining frequent 1-sequences is negligible. We adopt a two-phase architecture to find the two types of frequent sequences separately in order that the discovery of frequent k-sequences can be well designed and optimized. For efficient frequent k-sequence mining, every frequent 1-sequence is encoded as a unique symbol and the database is transformed into one constituted by the symbols. We find that it is unnecessary to encode all the frequent 1-seqences, and make full use of the discovered frequent 1-sequences to transform the database into one with a smaller size. For every k a parts per thousand yenaEuro parts per thousand 2, the customer sequences in the transformed database are scanned to find all the frequent k-sequences. We devise the compact representation for a customer sequence and elaborate the method to enumerate all distinct subsequences from a customer sequence without redundant scans. The soundness of the proposed approach is verified and a number of experiments are performed. The results show that our approach outperforms the previous works in both scalability and execution time.
引用
收藏
页码:23 / 51
页数:29
相关论文
共 50 条
  • [41] An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns
    Zheng, Zhigang
    Zhao, Yanchang
    Zuo, Ziye
    Cao, Longbing
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I, PROCEEDINGS, 2010, 6118 : 262 - 273
  • [42] Efficient algorithm for mining weighted sequential patterns based on graph traversals
    School of Information Science and Technology, Shandong Institute of Light Industry, Ji'nan 250353, China
    不详
    Kongzhi yu Juece Control Decis, 2009, 5 (663-669):
  • [43] Efficient High Utility Negative Sequential Patterns Mining in Smart Campus
    Xu, Tiantian
    Li, Tongxuan
    Dong, Xiangjun
    IEEE ACCESS, 2018, 6 : 23839 - 23847
  • [44] An efficient algorithm for mining periodic high-utility sequential patterns
    Duy-Tai Dinh
    Bac Le
    Fournier-Viger, Philippe
    Van-Nam Huynh
    APPLIED INTELLIGENCE, 2018, 48 (12) : 4694 - 4714
  • [45] An Efficient Algorithm for Mining Maximal Frequent Sequential Patterns in Large Databases
    Su, Qiu-bin
    Lu, Lu
    Cheng, Bin
    2018 INTERNATIONAL CONFERENCE ON COMMUNICATION, NETWORK AND ARTIFICIAL INTELLIGENCE (CNAI 2018), 2018, : 404 - 410
  • [46] Efficient mining of sequential patterns with time constraints by delimited pattern growth
    Ming-Yen Lin
    Suh-Yin Lee
    Knowledge and Information Systems, 2005, 7 : 499 - 514
  • [47] Spatial Data Mining: Database Primitives, Algorithms and Efficient DBMS Support
    Martin Ester
    Alexander Frommelt
    Hans-Peter Kriegel
    Jöorg Sander
    Data Mining and Knowledge Discovery, 2000, 4 : 193 - 216
  • [48] An efficient algorithm for mining periodic high-utility sequential patterns
    Duy-Tai Dinh
    Bac Le
    Philippe Fournier-Viger
    Van-Nam Huynh
    Applied Intelligence, 2018, 48 : 4694 - 4714
  • [49] Spatial data mining: Database primitives, algorithms and efficient DBMS support
    Ester, M
    Frommelt, A
    Kriegel, HP
    Sander, J
    DATA MINING AND KNOWLEDGE DISCOVERY, 2000, 4 (2-3) : 193 - 216
  • [50] EFFICIENT MINING OF LOCAL FREQUENT PERIODIC PATTERNS IN TIME SERIES DATABASE
    Gu, Cheng-Kui
    Dong, Xiao-Li
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 183 - 186