Effective data mining by integrating genetic algorithm into the data preprocessing phase

被引:0
|
作者
Gopalan, J [1 ]
Korkmaz, E [1 ]
Alhajj, R [1 ]
Barker, K [1 ]
机构
[1] Univ Calgary, Dept Comp Sci, Calgary, AB T2N 1N4, Canada
关键词
pre-processing; data mining; classification; association; genetic algorithms; clustering; data-splitting;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dividing a data set into a training set and a test set is fundamental component in the pre-processing phase of data mining (DM). Effectively, the choice of the training set is an important factor in deriving good classification rules. Traditional approach for association rules mining divides the dataset into training set and test set based on statistical methods. In this paper, we highlight the weaknesses of the existing approach and hence propose a new methodology that employs genetic algorithm (GA) in the process. In our approach, the original dataset is divided into sample and validation sets. Then, GA is used to find an appropriate split of the sample set into training and test sets. We demonstrate through experiments that using the obtained training set as the input to an association rules mining algorithm generates high accuracy classification rules. The rules are tested on the validation set for accuracy. The results are very satisfactory; they demonstrate the applicability and effectiveness of our approach.
引用
收藏
页码:331 / 336
页数:6
相关论文
共 50 条
  • [41] Improved TurboEdit Algorithm for Un-Differenced Phase Data Preprocessing
    Yuan Yubin
    Dang Yamin
    Cheng Yingyan
    CPGPS 2009: GLOBAL NAVIGATION SATELLITE SYSTEM: TECHNOLOGY INNOVATION AND APPLICATION, PROCEEDINGS, 2009, : 94 - 98
  • [42] Data Preprocessing with GPU for DBSCAN Algorithm
    Cal, Piotr
    Wozniak, Michal
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS CORES 2013, 2013, 226 : 793 - 801
  • [43] A data preprocessing framework for students' outcome prediction by data mining techniques
    Danubianu, Mirela
    2015 19TH INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2015, : 836 - 841
  • [44] An efficient data preprocessing approach for large scale medical data mining
    Hu, Ya-Han
    Lin, Wei-Chao
    Tsai, Chih-Fong
    Ke, Shih-Wen
    Chen, Chih-Wen
    TECHNOLOGY AND HEALTH CARE, 2015, 23 (02) : 153 - 160
  • [45] A hybrid algorithm combined genetic algorithm with information entropy for data mining
    Tang, Hua
    Lu, Jun
    ICIEA 2007: 2ND IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, VOLS 1-4, PROCEEDINGS, 2007, : 753 - +
  • [46] Integrating data mining with SQL databases: OLE DB for data mining
    Netz, A
    Chaudhuri, S
    Fayyad, U
    Bernhardt, J
    17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, : 379 - 387
  • [47] KEEL:: A data mining software tool integrating genetic fuzzy systems
    Alcala-Fdez, Jesus
    Garcia, Salvador
    Berlanga, Francisco Jose
    Fernandez, Alberto
    Sanchez, Luciano
    del Jesus, M. J.
    Herrera, Francisco
    2008 3RD INTERNATIONAL WORKSHOP ON GENETIC AND EVOLVING FUZZY SYSTEMS, 2008, : 81 - +
  • [48] An effective parallel approach for genetic-fuzzy data mining
    Hong, Tzung-Pei
    Lee, Yeong-Chyi
    Wu, Min-Thai
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (02) : 655 - 662
  • [49] An effective distributed privacy-preserving data mining algorithm
    Fukasawa, T
    Wang, JH
    Takata, T
    Miyazaki, M
    INTELLIGENT DAA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 320 - 325
  • [50] Data mining algorithm for text data
    Chen, Yuquan
    Zhu, Xijun
    Lu, Ruzhan
    Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2000, 34 (07): : 936 - 938