Effective data mining by integrating genetic algorithm into the data preprocessing phase

被引:0
|
作者
Gopalan, J [1 ]
Korkmaz, E [1 ]
Alhajj, R [1 ]
Barker, K [1 ]
机构
[1] Univ Calgary, Dept Comp Sci, Calgary, AB T2N 1N4, Canada
关键词
pre-processing; data mining; classification; association; genetic algorithms; clustering; data-splitting;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dividing a data set into a training set and a test set is fundamental component in the pre-processing phase of data mining (DM). Effectively, the choice of the training set is an important factor in deriving good classification rules. Traditional approach for association rules mining divides the dataset into training set and test set based on statistical methods. In this paper, we highlight the weaknesses of the existing approach and hence propose a new methodology that employs genetic algorithm (GA) in the process. In our approach, the original dataset is divided into sample and validation sets. Then, GA is used to find an appropriate split of the sample set into training and test sets. We demonstrate through experiments that using the obtained training set as the input to an association rules mining algorithm generates high accuracy classification rules. The rules are tested on the validation set for accuracy. The results are very satisfactory; they demonstrate the applicability and effectiveness of our approach.
引用
收藏
页码:331 / 336
页数:6
相关论文
共 50 条
  • [21] A DATA MINING METHODOLOGY WITH PREPROCESSING STEPS
    Speckauskiene, Vita
    Lukosevicius, Arunas
    INFORMATION TECHNOLOGY AND CONTROL, 2009, 38 (04): : 319 - 324
  • [22] The GUHA method, data preprocessing and mining
    Hájek, P
    Rauch, J
    Coufal, D
    Feglar, T
    DATABASE SUPPORT FOR DATA MINING APPLICATIONS: DISCOVERING KNOWLEDGE WITH INDUCTIVE QUERIES, 2004, 2682 : 135 - 153
  • [23] DATA PREPROCESSING IN WEB TEXT MINING
    Jiang Yongbo
    FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING (ICACTE 2012), 2012, : 573 - 581
  • [24] An ordered preprocessing scheme for data mining
    Cruz, LR
    Pérez, J
    Landero, VL
    del Angel, ES
    Alvarez, VM
    Peréz, V
    PRICAI 2004: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3157 : 1007 - 1008
  • [25] Ensemble imbalance classification: Using data preprocessing, clustering algorithm and genetic algorithm
    Abolkarlou, Niloofar Afshari
    Niknafs, Ali Akbar
    Ebrahimpour, Mohammad Kazem
    2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 171 - 176
  • [26] Two stages of case-based reasoning - Integrating genetic algorithm with data mining mechanism
    Yang, Heng-Li
    Wang, Cheng-Shu
    EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (1-2) : 262 - 272
  • [27] Analysis for Data Preprocessing To Prevent Direct Discrimination in Data Mining
    Aneyrao, Trupti A.
    Fadnavis, R. A.
    2016 WORLD CONFERENCE ON FUTURISTIC TRENDS IN RESEARCH AND INNOVATION FOR SOCIAL WELFARE (STARTUP CONCLAVE), 2016,
  • [28] Raw Wind Data Preprocessing: A Data-Mining Approach
    Zheng, Le
    Hu, Wei
    Min, Yong
    IEEE TRANSACTIONS ON SUSTAINABLE ENERGY, 2015, 6 (01) : 11 - 19
  • [29] An efficient data preprocessing method for mining customer survey data
    Zhang, N.
    Lu, W. F.
    2007 5TH IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS, VOLS 1-3, 2007, : 573 - +
  • [30] Using a data metric for preprocessing advice for data mining applications
    Engels, R
    Theusinger, C
    ECAI 1998: 13TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1998, : 430 - 434