Complexity of rule sets in mining incomplete data using characteristic sets and generalized maximal consistent blocks

被引:3
|
作者
Clark, Patrick G. [1 ]
Gao, Cheng [1 ]
Grzymala-Busse, Jerzy W. [1 ,2 ]
Mroczek, Teresa [2 ]
Niemiec, Rafal [2 ]
机构
[1] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA
[2] Univ Informat Technol & Management, Dept Expert Syst & Artificial Intelligence, PL-35225 Rzeszow, Poland
关键词
Incomplete data; characteristic sets; maximal consistent blocks; MLEM2 rule induction algorithm; probabilistic approximations;
D O I
10.1093/jigpal/jzaa041
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, missing attribute values in incomplete data sets have three possible interpretations: lost values, attribute-concept values and 'do not care' conditions. For rule induction, we use characteristic sets and generalized maximal consistent blocks. Therefore, we apply six different approaches for data mining. As follows from our previous experiments, where we used an error rate evaluated by ten-fold cross validation as the main criterion of quality, no approach is universally the best. Thus, we decided to compare our six approaches using complexity of rule sets induced from incomplete data sets. We show that the smallest rule sets are induced from incomplete data sets with attribute-concept values, while the most complicated rule sets are induced from data sets with lost values. The choice between interpretations of missing attribute values is more important than the choice between characteristic sets and generalized maximal consistent blocks.
引用
收藏
页码:124 / 137
页数:14
相关论文
共 50 条
  • [31] Tree structure for efficient data mining using rough sets
    Ananthanarayana, VS
    Murty, MN
    Subramanian, DK
    PATTERN RECOGNITION LETTERS, 2003, 24 (06) : 851 - 862
  • [32] Mining large engineering data sets on the grid using AURA
    Liang, B
    Austin, J
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 430 - 436
  • [33] Data mining a prostate cancer dataset using rough sets
    Revett, Kenneth
    de Magalhaes, Sergio Tenreiro
    Santos, Henrique A. D.
    2006 3RD INTERNATIONAL IEEE CONFERENCE INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2006, : 285 - 288
  • [34] Data mining in intelligent tutoring systems using rough sets
    Attia, SS
    Mahdi, HMK
    Mohammad, HK
    ICEEC'04: 2004 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTER ENGINEERING, PROCEEDINGS, 2004, : 179 - 184
  • [35] Application of data mining based on rough sets and association rule in laminar cooling system
    State Key Laboratory of Rolling and Automation, Northeastern University, Shenyang 110004, China
    不详
    Dongbei Daxue Xuebao/Journal of Northeastern University, 2007, 28 (11): : 1583 - 1585
  • [36] A rough sets based characteristic relation approach for dynamic attribute generalization in data mining
    Li, Tianrui
    Ruan, Da
    Geert, Wets
    Song, Jing
    Xu, Yang
    KNOWLEDGE-BASED SYSTEMS, 2007, 20 (05) : 485 - 494
  • [37] On Software Fault Prediction by Mining Software Complexity Data with Dynamically Filtered Training Sets
    Podgorelec, Vili
    PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE ON SIMULATION, MODELLING AND OPTIMIZATION, 2009, : 332 - +
  • [38] Generalized association rule mining using an efficient data structure
    Wu, Chieh-Ming
    Huang, Yin-Fu
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (06) : 7277 - 7290
  • [39] HIV data analysis via rule extraction using rough sets
    Tettey, Thando
    Nelwamondo, Fulufhelo V.
    Marwala, Tshilidzi
    INES 2007: 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT ENGINEERING SYSTEMS, PROCEEDINGS, 2007, : 105 - +
  • [40] Attribute relevance in multiclass data sets using the naive Bayes rule
    Sotoca, JM
    Sánchez, JS
    Pla, F
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, 2004, : 426 - 429