Effective data mining by integrating genetic algorithm into the data preprocessing phase

被引:0
|
作者
Gopalan, J [1 ]
Korkmaz, E [1 ]
Alhajj, R [1 ]
Barker, K [1 ]
机构
[1] Univ Calgary, Dept Comp Sci, Calgary, AB T2N 1N4, Canada
关键词
pre-processing; data mining; classification; association; genetic algorithms; clustering; data-splitting;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dividing a data set into a training set and a test set is fundamental component in the pre-processing phase of data mining (DM). Effectively, the choice of the training set is an important factor in deriving good classification rules. Traditional approach for association rules mining divides the dataset into training set and test set based on statistical methods. In this paper, we highlight the weaknesses of the existing approach and hence propose a new methodology that employs genetic algorithm (GA) in the process. In our approach, the original dataset is divided into sample and validation sets. Then, GA is used to find an appropriate split of the sample set into training and test sets. We demonstrate through experiments that using the obtained training set as the input to an association rules mining algorithm generates high accuracy classification rules. The rules are tested on the validation set for accuracy. The results are very satisfactory; they demonstrate the applicability and effectiveness of our approach.
引用
收藏
页码:331 / 336
页数:6
相关论文
共 50 条
  • [1] Preprocessing DNS Log Data for Effective Data Mining
    Snyder, Mark E.
    Sundaram, Ravi
    Thakur, Mayur
    2009 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-8, 2009, : 1366 - +
  • [2] Data Preprocessing Algorithm for Web Structure Mining
    Sharma, Suvarn
    Bhagat, Amit
    2016 FIFTH INTERNATIONAL CONFERENCE ON ECO-FRIENDLY COMPUTING AND COMMUNICATION SYSTEMS (ICECCS), 2016, : 94 - 98
  • [3] An effective Data Preprocessing method for Web Usage Mining
    Reddy, K. Sudheer
    Reddy, M. Kantha
    Sitaramulu, V.
    2013 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2013, : 7 - 10
  • [4] DB-HReduction: A data preprocessing algorithm for data mining applications
    Hu, XH
    APPLIED MATHEMATICS LETTERS, 2003, 16 (06) : 889 - 895
  • [5] Study on data preprocessing algorithm in web log mining
    Yuan, F
    Wang, LJ
    Yu, G
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 28 - 32
  • [6] Data Preprocessing for Web Data Mining
    Zhang, Wei
    Chen, Tinggui
    ADVANCES IN ELECTRONIC COMMERCE, WEB APPLICATION AND COMMUNICATION, VOL 2, 2012, 149 : 303 - +
  • [7] Data preprocessing in predictive data mining
    Alexandropoulos, Stamatios-Aggelos N.
    Kotsiantis, Sotiris B.
    Vrahatis, Michael N.
    KNOWLEDGE ENGINEERING REVIEW, 2019, 34
  • [8] Preprocessing of Alarm Data for Data Mining
    Mannani, Zahra
    Izadi, Iman
    Ghadiri, Nasser
    INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2019, 58 (26) : 11261 - 11274
  • [9] An Effective Clustering Algorithm for Data Mining
    Vijendra, Singh
    Ashwini, Kelkar
    Laxman, Sahoo
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA STORAGE AND DATA ENGINEERING (DSDE 2010), 2010, : 250 - 253
  • [10] Efficient Data Preprocessing for Genetic-Fuzzy Mining with MapReduce
    Hong, Tzung-Pei
    Liu, Yu-Yang
    Wu, Min-Thai
    Tsai, Chun-Wei
    2015 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW), 2015, : 88 - 89