Hierarchical Pattern Mining with the Automata Processor

被引:4
|
作者
Wang, Ke [1 ]
Sadredini, Elaheh [1 ]
Skadron, Kevin [1 ]
机构
[1] Univ Virginia, Dept Comp Sci, 85 Engineers Way, Charlottesville, VA 22904 USA
基金
美国国家科学基金会;
关键词
Data mining; Automata Processor; Sequential pattern mining; Disjunctive rule mining; Finite automaton; SEQUENTIAL PATTERNS; PARALLEL; ALGORITHMS; EFFICIENT;
D O I
10.1007/s10766-017-0489-y
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Mining complex patterns with hierarchical structures becomes more and more important to understand the underlying information in large and unstructured databases. When compared with a set-mining problem or a string-mining problem, the computation complexity to recognize a pattern with hierarchical structure, and the large associated search space, make hierarchical pattern mining (HPM) extremely expensive on conventional processor architectures. We propose a flexible, hardware-accelerated framework for mining hierarchical patterns with Apriori-based algorithms, which leads to multi-pass pruning strategies but exposes massive parallelism. Under this framework, we implemented two widely used HPM techniques, sequential pattern mining (SPM) and disjunctive rule mining (DRM) on the Automata Processor (AP), a hardware implementation of non-deterministic finite automata (NFAs). Two automaton-design strategies for matching and counting different types of hierarchical patterns, called linear design and reduction design, are proposed in this paper. To generalize automaton structure for SPM, the linear design strategy is proposed by flattening sequential patterns to plain strings to produce automaton design space and to minimize the overhead of reconfiguration. Up to 90 and 29 speedups are achieved by the AP-accelerated algorithm on six real-world datasets, when compared with the optimized multicore CPU and GPU GSP implementations, respectively. The proposed CPU-AP solution also outperforms the state-of-the-art PrefixSpan and SPADE algorithms on a multicore CPU by up to 452 and 49 speedups. The AP advantage grows further with larger datasets. For DRM, the reduction design strategy is adopted by applying reduction operation of AND, with on-chip Boolean units, on several parallel sub-structures for recognizing disjunctive items. This strategy allows implicit OR reduction on alternative items within a disjunctive item by utilizing bit-wise parallelism feature of the on-chip state units. The experiments show up to 614 speedups of the proposed CPU-AP DRM solution over a sequential CPU algorithm on two real-world datasets. The experiments also show significant increase of CPU matching-and-counting time when increasing d-rule size or the number of alternative items. However, in practical cases, the AP solution runs hundreds of times faster in matching and counting than the CPU solution, and keeps constant processing time despite the increasing complexity of disjunctive rules.
引用
收藏
页码:376 / 411
页数:36
相关论文
共 50 条
  • [1] Hierarchical Pattern Mining with the Automata Processor
    Ke Wang
    Elaheh Sadredini
    Kevin Skadron
    International Journal of Parallel Programming, 2018, 46 : 376 - 411
  • [2] Sequential Pattern Mining with the Micron Automata Processor
    Wang, Ke
    Sadredini, Elaheh
    Skadron, Kevin
    PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 135 - 144
  • [3] HIERARCHICAL TEMPORAL MEMORY ON THE AUTOMATA PROCESSOR
    Putic, Mateja
    Varshneya, A. J.
    Stan, Mircea R.
    IEEE MICRO, 2017, 37 (01) : 52 - 59
  • [4] Association Rule Mining with the Micron Automata Processor
    Wang, Ke
    Qi, Yanjun
    Fox, Jeffrey J.
    Stan, Mircea R.
    Skadron, Kevin
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 689 - 699
  • [5] High Performance Pattern Matching using the Automata Processor
    Roy, Indranil
    Srivastava, Ankit
    Nourian, Marziyeh
    Becchi, Michela
    Aluru, Srinivas
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 1123 - 1132
  • [6] Evaluating High Performance Pattern Matching on the Automata Processor
    Roy, Indranil
    Srivastava, Ankit
    Grimm, Matt
    Nourian, Marziyeh
    Becchi, Michela
    Aluru, Srinivas
    IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (08) : 1201 - 1212
  • [7] LAP: A Lightweight Automata Processor for Pattern Matching Tasks
    Xia, Haojun
    Gong, Lei
    Wang, Chao
    Chen, Xianglan
    Zhou, Xuehai
    PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 844 - 849
  • [8] Mining probabilistic automata: a statistical view of sequential pattern mining
    Jacquemont, Stephanie
    Jacquenet, Francois
    Sebban, Marc
    MACHINE LEARNING, 2009, 75 (01) : 91 - 127
  • [9] Mining probabilistic automata: a statistical view of sequential pattern mining
    Stéphanie Jacquemont
    François Jacquenet
    Marc Sebban
    Machine Learning, 2009, 75 : 91 - 127
  • [10] Tree pattern mining with tree automata constraints
    de Amo, Sandra
    Silva, Nyara A.
    Silva, Ronaldo P.
    Pereira, Fabiola S.
    INFORMATION SYSTEMS, 2010, 35 (05) : 570 - 591