Sampling Based on Genetic Algorithm for Data Mining

被引:0
|
作者
Wang Jianyong [1 ]
Huang Yu [1 ]
Hu Bin [1 ]
Wei Xiaomei [1 ]
机构
[1] Huazhong Agr Univ, Coll Sci, Wuhan 430070, Hubei Province, Peoples R China
关键词
Genetic algorithm; Association rules; Accuracy;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Collecting a large initial sample set from a huge data set, and then distilling a smaller sample set from the initial set in the same accuracy can greatly enhance the speeds of data mining algorithms. As the distilling process is proved as a NP-hard problem, the two-phase sampling algorithm FAST adopts a kind of geed method. Adopting genetic algorithm in sample distilling, a sampling algorithm SGA is presented in this paper, which performs better than popular sampling algorithms including FAST in the experiment.
引用
收藏
页码:3667 / 3672
页数:6
相关论文
共 50 条
  • [31] Frequent Itemset Mining Algorithm based on Sampling Method
    Li, Haifeng
    Zhang, Ning
    Zhang, Yuejin
    PROCEEDINGS OF THE 2015 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND AUTOMATION ENGINEERING, 2016, 42 : 852 - 855
  • [32] Sampling learning based Association Rules Mining Algorithm
    Xie, Xiaoying
    Zhang, Ying
    Xu, Yingtao
    2012 IEEE FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2012, : 281 - 283
  • [33] Genetic algorithm restricted by Tabu Lists in Data Mining
    Lopes, FM
    Pozo, ATR
    SCCC 2001: XXI INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY, PROCEEDINGS, 2001, : 178 - 185
  • [34] FI-PS: An algorithm for mining frequent itemsets in data streams based on period sampling
    Hou, Wei
    Yang, Bingru
    Wu, Chensheng
    Zhou, Zhun
    Gaojishu Tongxin/Chinese High Technology Letters, 2009, 19 (08): : 817 - 824
  • [35] Design of intelligent data sampling methodology based on data mining
    Lee, JH
    Yu, SJ
    Park, SC
    IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 2001, 17 (05): : 637 - 649
  • [36] A data mining algorithm based on grid
    Zang, XB
    Li, XF
    Zhao, K
    Guan, X
    GRID AND COOPERATIVE COMPUTING, PT 2, 2004, 3033 : 807 - 810
  • [37] A hybrid algorithm combined genetic algorithm with information entropy for data mining
    Tang, Hua
    Lu, Jun
    ICIEA 2007: 2ND IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, VOLS 1-4, PROCEEDINGS, 2007, : 753 - +
  • [38] Research on the big data mining algorithm based on modified neural network and structure optimized genetic algorithm
    Liang, Yi
    Cai, Xiangyun
    Xiong, Zilun
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES), 2016, : 1022 - 1027
  • [39] An automatic sampling ratio detection method based on genetic algorithm for imbalanced data classification
    Zheng, Ming
    Li, Tong
    Sun, Liping
    Wang, Taochun
    Jie, Biao
    Yang, Weiyi
    Tang, Mingjing
    Lv, Changlong
    KNOWLEDGE-BASED SYSTEMS, 2021, 216 (216)
  • [40] Effective data mining by integrating genetic algorithm into the data preprocessing phase
    Gopalan, J
    Korkmaz, E
    Alhajj, R
    Barker, K
    ICMLA 2005: FOURTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2005, : 331 - 336