Complexity of rule sets in mining incomplete data using characteristic sets and generalized maximal consistent blocks

被引:3
|
作者
Clark, Patrick G. [1 ]
Gao, Cheng [1 ]
Grzymala-Busse, Jerzy W. [1 ,2 ]
Mroczek, Teresa [2 ]
Niemiec, Rafal [2 ]
机构
[1] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA
[2] Univ Informat Technol & Management, Dept Expert Syst & Artificial Intelligence, PL-35225 Rzeszow, Poland
关键词
Incomplete data; characteristic sets; maximal consistent blocks; MLEM2 rule induction algorithm; probabilistic approximations;
D O I
10.1093/jigpal/jzaa041
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, missing attribute values in incomplete data sets have three possible interpretations: lost values, attribute-concept values and 'do not care' conditions. For rule induction, we use characteristic sets and generalized maximal consistent blocks. Therefore, we apply six different approaches for data mining. As follows from our previous experiments, where we used an error rate evaluated by ten-fold cross validation as the main criterion of quality, no approach is universally the best. Thus, we decided to compare our six approaches using complexity of rule sets induced from incomplete data sets. We show that the smallest rule sets are induced from incomplete data sets with attribute-concept values, while the most complicated rule sets are induced from data sets with lost values. The choice between interpretations of missing attribute values is more important than the choice between characteristic sets and generalized maximal consistent blocks.
引用
收藏
页码:124 / 137
页数:14
相关论文
共 50 条
  • [11] On Measuring Inconsistency Using Maximal Consistent Sets
    Ammoura, Meriem
    Raddaoui, Badran
    Salhi, Yakoub
    Oukacha, Brahim
    SYMBOLIC AND QUANTITATIVE APPROACHES TO REASONING WITH UNCERTAINTY, ECSQARU 2015, 2015, 9161 : 267 - 276
  • [12] On generalized quantifiers, finite sets and data mining
    Hájek, P
    INTELLIGENT INFORMATION PROCESSING AND WEB MINING, 2003, : 489 - 496
  • [13] Multicost Decision-Theoretic Rough Sets Based on Maximal Consistent Blocks
    Ma, Xingbin
    Yang, Xibei
    Qi, Yong
    Song, Xiaoning
    Yang, Jingyu
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, RSKT 2014, 2014, 8818 : 824 - 833
  • [14] A Comparison of Four Classification Systems Using Rule Sets Induced from Incomplete Data Sets by Local Probabilistic Approximations
    Clark, Patrick G.
    Gao, Cheng
    Grzymala-Busse, Jerzy W.
    FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 282 - 291
  • [15] On the use of conceptual reconstruction for mining massively incomplete data sets
    Parthasarathy, S
    Aggarwal, CC
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (06) : 1512 - 1521
  • [16] Mining from incomplete quantitative data by fuzzy rough sets
    Hong, Tzung-Pei
    Tseng, Li-Huei
    Chien, Been-Chian
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (03) : 2644 - 2653
  • [17] Rule Set Complexity in Mining Incomplete Data Using Global and Saturated Probabilistic Approximations
    Clark, Patrick G.
    Grzymala-Busse, Jerzy W.
    Mroczek, Teresa
    Niemiec, Rafal
    INFORMATION AND SOFTWARE TECHNOLOGIES, ICIST 2019, 2019, 1078 : 451 - 462
  • [18] Rule Set Complexity for Incomplete Data Sets with Many Attribute-Concept Values and "Do Not Care" Conditions
    Clark, Patrick G.
    Gao, Cheng
    Grzymala-Busse, Jerzy W.
    ROUGH SETS, (IJCRS 2016), 2016, 9920 : 65 - 74
  • [19] An Analysis of Probabilistic Approximations for Rule Induction from Incomplete Data Sets
    Clark, Patrick G.
    Grzymala-Busse, JerzyW.
    Hippe, Zdzislaw S.
    FUNDAMENTA INFORMATICAE, 2014, 132 (03) : 365 - 379
  • [20] Application of self-organising maps for data mining with incomplete data sets
    S. Wang
    Neural Computing & Applications, 2003, 12 : 42 - 48