Effective database transformation and efficient support computation for mining sequential patterns

被引:0
|
作者
Cho, Chung-Wen [2 ]
Wu, Yi-Hung [3 ]
Chen, Arbee L. P. [1 ]
机构
[1] Natl Chengchi Univ, Dept Comp Sci, Taipei, Taiwan
[2] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu 30043, Taiwan
[3] Chung Yuan Christian Univ, Dept Informat & Comp Engn, Jhongli, Taiwan
关键词
Data mining; Sequential patterns; Database transformation; Support computation; Database projection; ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel algorithm for mining frequent sequences from transaction databases. The transactions of the same customers form a set of customer sequences. A sequence (an ordered list of itemsets) is frequent if the number of customer sequences containing it satisfies the user-specified threshold. The 1-sequence is a special type of sequences because it consists of only a single itemset instead of an ordered list, while the k-sequence is a sequence composed of k itemsets. Compared with the cost of mining frequent k-sequences (k a parts per thousand yenaEuro parts per thousand 2), the cost of mining frequent 1-sequences is negligible. We adopt a two-phase architecture to find the two types of frequent sequences separately in order that the discovery of frequent k-sequences can be well designed and optimized. For efficient frequent k-sequence mining, every frequent 1-sequence is encoded as a unique symbol and the database is transformed into one constituted by the symbols. We find that it is unnecessary to encode all the frequent 1-seqences, and make full use of the discovered frequent 1-sequences to transform the database into one with a smaller size. For every k a parts per thousand yenaEuro parts per thousand 2, the customer sequences in the transformed database are scanned to find all the frequent k-sequences. We devise the compact representation for a customer sequence and elaborate the method to enumerate all distinct subsequences from a customer sequence without redundant scans. The soundness of the proposed approach is verified and a number of experiments are performed. The results show that our approach outperforms the previous works in both scalability and execution time.
引用
收藏
页码:23 / 51
页数:29
相关论文
共 50 条
  • [21] BFSPMiner: an effective and efficient batch-free algorithm for mining sequential patterns over data streams
    Marwan Hassani
    Daniel Töws
    Alfredo Cuzzocrea
    Thomas Seidl
    International Journal of Data Science and Analytics, 2019, 8 : 223 - 239
  • [22] Efficient mining of sequential patterns with time constraints: Reducing the combinations
    Masseglia, F.
    Poncelet, P.
    Teisseire, M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) : 2677 - 2690
  • [23] An Efficient Parallel Method for Mining Frequent Closed Sequential Patterns
    Bao Huynh
    Bay Vo
    Snasel, Vaclav
    IEEE ACCESS, 2017, 5 : 17392 - 17402
  • [24] An efficient data mining technique for discovering interesting sequential patterns
    Yen, SJ
    Lee, YS
    2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 663 - 664
  • [25] Efficient mining gapped sequential patterns for motifs in biological sequences
    Liao, Vance Chiang-Chi
    Chen, Ming-Syan
    BMC SYSTEMS BIOLOGY, 2013, 7
  • [26] Two efficient algorithms for mining high utility sequential patterns
    Zhang, Chunkai
    Zu, Yiwen
    Nie, Junli
    Du, Linzi
    2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 905 - 911
  • [27] Efficient Mining of Maximal Sequential Patterns Using Multiple Samples
    Luo, Congnan
    Chung, Soon M.
    PROCEEDINGS OF THE FIFTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2005, : 415 - 426
  • [28] An Efficient Approach for Mining Weighted Sequential Patterns in Dynamic Databases
    Ishita, Sabrina Zaman
    Noor, Faria
    Ahmed, Chowdhury Farhan
    ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS (ICDM 2018), 2018, 10933 : 215 - 229
  • [29] Efficient mining of partial periodic patterns in time series database
    Han, JW
    Dong, GZ
    Yin, YW
    15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, : 106 - 115
  • [30] Efficient computation of wave transformation matrices to support coastal management
    Carapuco, M. M.
    Taborda, R.
    ESTUARINE COASTAL AND SHELF SCIENCE, 2025, 313