Complexity of rule sets in mining incomplete data using characteristic sets and generalized maximal consistent blocks

被引:3
|
作者
Clark, Patrick G. [1 ]
Gao, Cheng [1 ]
Grzymala-Busse, Jerzy W. [1 ,2 ]
Mroczek, Teresa [2 ]
Niemiec, Rafal [2 ]
机构
[1] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA
[2] Univ Informat Technol & Management, Dept Expert Syst & Artificial Intelligence, PL-35225 Rzeszow, Poland
关键词
Incomplete data; characteristic sets; maximal consistent blocks; MLEM2 rule induction algorithm; probabilistic approximations;
D O I
10.1093/jigpal/jzaa041
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, missing attribute values in incomplete data sets have three possible interpretations: lost values, attribute-concept values and 'do not care' conditions. For rule induction, we use characteristic sets and generalized maximal consistent blocks. Therefore, we apply six different approaches for data mining. As follows from our previous experiments, where we used an error rate evaluated by ten-fold cross validation as the main criterion of quality, no approach is universally the best. Thus, we decided to compare our six approaches using complexity of rule sets induced from incomplete data sets. We show that the smallest rule sets are induced from incomplete data sets with attribute-concept values, while the most complicated rule sets are induced from data sets with lost values. The choice between interpretations of missing attribute values is more important than the choice between characteristic sets and generalized maximal consistent blocks.
引用
收藏
页码:124 / 137
页数:14
相关论文
共 50 条
  • [21] Application of self-organising maps for data mining with incomplete data sets
    Wang, SH
    NEURAL COMPUTING & APPLICATIONS, 2003, 12 (01): : 42 - 48
  • [22] Complexity of Rule Sets Induced from Data Sets with Many Lost and Attribute-Concept Values
    Clark, Patrick G.
    Gao, Cheng
    Grzymala-Busse, Jerzy W.
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, (ICAISC 2016), PT II, 2016, 9693 : 27 - 36
  • [23] Generalized zone separation functionals for convex perfect forms and incomplete data sets
    Kaiser, MJ
    INTERNATIONAL JOURNAL OF MACHINE TOOLS & MANUFACTURE, 1998, 38 (04): : 375 - 404
  • [24] A Generalized MapReduce Approach for Efficient mining of Large data Sets in the GRID
    Roehm, Matthias
    Grabert, Matthias
    Schweiggert, Franz
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, GRIDS, AND VIRTUALIZATION (CLOUD COMPUTING 2010), 2010, : 14 - 19
  • [25] Parallel Distributed Genetic Rule Selection for Data Mining from Large Data Sets
    Nojima, Yusuke
    Mihara, Shingo
    Ishibuchi, Hisao
    SIMULATION AND MODELING RELATED TO COMPUTATIONAL SCIENCE AND ROBOTICS TECHNOLOGY, 2012, 37 : 140 - 154
  • [26] Reconstruction of Incomplete Data Sets or Images Using Direct Sampling
    Gregoire Mariethoz
    Philippe Renard
    Mathematical Geosciences, 2010, 42 : 245 - 268
  • [27] Reconstruction of Incomplete Data Sets or Images Using Direct Sampling
    Mariethoz, Gregoire
    Renard, Philippe
    MATHEMATICAL GEOSCIENCES, 2010, 42 (03) : 245 - 268
  • [28] Extended Association Rule Mining and Its Application to Software Engineering Data Sets
    Saito, Hidekazu
    Nishiura, Kinari
    Monden, Akito
    Morisaki, Shuji
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2024, 34 (11) : 1735 - 1756
  • [29] Global and componentwise extrapolation for accelerating data mining from large incomplete data sets with the EM algorithm
    Hsu, Chun-Nan
    Huang, Han-Shen
    Yang, Bo-Hou
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 265 - +
  • [30] A technique for preparing thermodynamic property tables using incomplete data sets
    Duarte-Garza, H
    Holste, JC
    Hall, KR
    Iglesias-Silva, GA
    FLUID PHASE EQUILIBRIA, 1998, 146 (1-2) : 123 - 137