Inverted Index Automata Frequent Itemset Mining for Large Dataset Frequent Itemset Mining

被引:0
|
作者
Dai, Xin [1 ]
Hamed, Haza Nuzly Abdull [1 ]
Su, Qichen [1 ]
Hao, Xue [1 ]
机构
[1] Univ Teknol Malaysia UTM, Fac Comp, Johor Baharu 81310, Johor, Malaysia
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Automata; Data mining; Itemsets; Memory management; Computational efficiency; Complexity theory; Real-time systems; Heuristic algorithms; Indexing; Distributed databases; Frequent itemset mining; inverted index; finite automata; depth-first search; large-scale data analysis; ASSOCIATION RULES; ALGORITHM; IMPLEMENTATION;
D O I
10.1109/ACCESS.2024.3521285
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent itemset mining (FIM) faces significant challenges with the expansion of large-scale datasets. Traditional algorithms such as Apriori, FP-Growth, and Eclat suffer from poor scalability and low efficiency when applied to modern datasets characterized by high dimensionality and high-density features. These algorithms demand substantial memory resources and multiple database scans, which diminishes their practicality for rapid data processing. To address these challenges, this study proposes the Inverted Index Automata Frequent Itemset Mining (IA-FIM) algorithm. IA-FIM integrates the swift retrieval of an inverted index with the robust pattern recognition of finite automata, enabling efficient processing of extensive datasets. Distinct from conventional FIM algorithms, IA-FIM utilizes an inverted index automata to efficiently reduce the search space and memory footprint, eliminating repetitive database scans and multiple tree constructions. The proposed algorithm employs a single-pass scan strategy, constructing a dynamic and adjustable inverted index for a streamlined and compact representation of data. IA-FIM demonstrates superior performance in processing large sparse dataset, enhancing the processing speed of large dataset and fulfilling the demands of the big data era. At the same time, it improves the efficiency and practicality of FIM by reducing repeated scans and large memory dependencies, making it more feasible when processing large dataset.
引用
收藏
页码:195111 / 195130
页数:20
相关论文
共 50 条
  • [21] Frequent Itemset Mining for Big Data
    Moens, Sandy
    Aksehirli, Emin
    Goethals, Bart
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [22] Frequent Itemset Mining for Big Data
    Chavan, Kiran
    Kulkarni, Priyanka
    Ghodekar, Pooja
    Patil, S. N.
    2015 International Conference on Green Computing and Internet of Things (ICGCIoT), 2015, : 1365 - 1368
  • [23] AT-Mine: An Efficient Algorithm of Frequent Itemset Mining on Uncertain Dataset
    Wang, Le
    Feng, Lin
    Wu, Mingfei
    JOURNAL OF COMPUTERS, 2013, 8 (06) : 1417 - 1426
  • [24] Probabilistic Frequent Itemset Mining in Uncertain Databases
    Bernecker, Thomas
    Kriegel, Hans-Peter
    Renz, Matthias
    Verhein, Florian
    Zuefle, Andreas
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 119 - 127
  • [25] Frequent itemset mining: A 25 years review
    Maria Luna, Jose
    Fournier-Viger, Philippe
    Ventura, Sebastian
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2019, 9 (06)
  • [26] An Improved Version of the Frequent Itemset Mining Algorithm
    Butincu, Cristian Nicolae
    Craus, Mitica
    2015 14TH ROEDUNET INTERNATIONAL CONFERENCE - NETWORKING IN EDUCATION AND RESEARCH (ROEDUNET NER), 2015, : 184 - 189
  • [27] Locally Differentially Private Frequent Itemset Mining
    Wang, Tianhao
    Li, Ninghui
    Jha, Somesh
    2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2018, : 127 - 143
  • [28] Pushing fuzzy constraints in frequent itemset mining
    Ren, Zhi-Bo
    Zhang, Qiang
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 1373 - +
  • [29] The Choice of Optimal Algorithm for Frequent Itemset Mining
    Busarov, Vyacheslav
    Grafeeva, Natalia
    Mikhailova, Elena
    DATABASES AND INFORMATION SYSTEMS IX, 2016, 291 : 211 - 224
  • [30] Improving direct counting for frequent itemset mining
    Prado, A
    Targa, C
    Plastino, A
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2004, 3181 : 371 - 380