Effective data mining by integrating genetic algorithm into the data preprocessing phase

被引:0
|
作者
Gopalan, J [1 ]
Korkmaz, E [1 ]
Alhajj, R [1 ]
Barker, K [1 ]
机构
[1] Univ Calgary, Dept Comp Sci, Calgary, AB T2N 1N4, Canada
关键词
pre-processing; data mining; classification; association; genetic algorithms; clustering; data-splitting;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dividing a data set into a training set and a test set is fundamental component in the pre-processing phase of data mining (DM). Effectively, the choice of the training set is an important factor in deriving good classification rules. Traditional approach for association rules mining divides the dataset into training set and test set based on statistical methods. In this paper, we highlight the weaknesses of the existing approach and hence propose a new methodology that employs genetic algorithm (GA) in the process. In our approach, the original dataset is divided into sample and validation sets. Then, GA is used to find an appropriate split of the sample set into training and test sets. We demonstrate through experiments that using the obtained training set as the input to an association rules mining algorithm generates high accuracy classification rules. The rules are tested on the validation set for accuracy. The results are very satisfactory; they demonstrate the applicability and effectiveness of our approach.
引用
收藏
页码:331 / 336
页数:6
相关论文
共 50 条
  • [31] Data Preprocessing Method on Data Mining of Web Log File
    Li, Jia
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL AND INFORMATION SCIENCES (ICCIS 2014), 2014, : 712 - 717
  • [32] An Efficient PANN Algorithm for Effective Spatial Data Mining
    Saranya, N. Naga
    Megala, S.
    Revathi, P.
    Nadiammai, G. V.
    Krishnaveni, S.
    Hemalatha, M.
    COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 705 - +
  • [33] Genetic algorithm restricted by Tabu Lists in Data Mining
    Lopes, FM
    Pozo, ATR
    SCCC 2001: XXI INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY, PROCEEDINGS, 2001, : 178 - 185
  • [34] An innovative data collection method to eliminate the preprocessing phase in web usage mining
    Canay, Ozkan
    Kocabicak, Umit
    ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2023, 40
  • [35] A Review Paper on Data Preprocessing: A Critical Phase in Web Usage Mining Process
    Dwivedi, Sanjay Kumar
    Rawat, Bhupesh
    2015 INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND INTERNET OF THINGS (ICGCIOT), 2015, : 506 - 510
  • [36] Data preprocessing by sequential pattern mining for LZW
    Vergara-Villegas, OO
    García-Hernández, RA
    Carrasco-Ochoa, JA
    Elías, RP
    Martínez-Trinidad, JF
    Sixth Mexican International Conference on Computer Science, Proceedings, 2005, : 82 - 87
  • [37] Smart Preprocessing Improves Data Stream Mining
    Hu, Hanqing
    Kantardzic, Mehmed
    PROCEEDINGS OF THE 49TH ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS 2016), 2016, : 1749 - 1757
  • [38] Data squashing as preprocessing in association rule mining
    Fister, Iztok
    Fister, Iztok, Jr.
    Novak, Damijan
    Verber, Domen
    2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 1720 - 1725
  • [39] Study on Data Preprocessing Process in Web Mining
    Peng, Sumian
    Zhou, Xingmei
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON INFORMATION, ELECTRONIC AND COMPUTER SCIENCE, VOLS I AND II, 2009, : 19 - 22
  • [40] Discretization and grouping: Preprocessing steps for data mining
    Berka, P
    Bruha, I
    PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 1510 : 239 - 245