Complexity of rule sets in mining incomplete data using characteristic sets and generalized maximal consistent blocks

被引:3
|
作者
Clark, Patrick G. [1 ]
Gao, Cheng [1 ]
Grzymala-Busse, Jerzy W. [1 ,2 ]
Mroczek, Teresa [2 ]
Niemiec, Rafal [2 ]
机构
[1] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA
[2] Univ Informat Technol & Management, Dept Expert Syst & Artificial Intelligence, PL-35225 Rzeszow, Poland
关键词
Incomplete data; characteristic sets; maximal consistent blocks; MLEM2 rule induction algorithm; probabilistic approximations;
D O I
10.1093/jigpal/jzaa041
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, missing attribute values in incomplete data sets have three possible interpretations: lost values, attribute-concept values and 'do not care' conditions. For rule induction, we use characteristic sets and generalized maximal consistent blocks. Therefore, we apply six different approaches for data mining. As follows from our previous experiments, where we used an error rate evaluated by ten-fold cross validation as the main criterion of quality, no approach is universally the best. Thus, we decided to compare our six approaches using complexity of rule sets induced from incomplete data sets. We show that the smallest rule sets are induced from incomplete data sets with attribute-concept values, while the most complicated rule sets are induced from data sets with lost values. The choice between interpretations of missing attribute values is more important than the choice between characteristic sets and generalized maximal consistent blocks.
引用
收藏
页码:124 / 137
页数:14
相关论文
共 50 条
  • [41] Efficiently mining maximal l -reachability co-location patterns from spatial data sets
    Zou, Muquan
    Wang, Lizhen
    Wu, Pingping
    Tran, Vanha
    INTELLIGENT DATA ANALYSIS, 2023, 27 (01) : 269 - 295
  • [42] Visual data mining of large data sets using Vitamin-S system
    Antoch, J
    NEURAL NETWORK WORLD, 2005, 15 (04) : 283 - 293
  • [43] Using Supervised Complexity Measures in the Analysis of Cancer Gene Expression Data Sets
    Costa, Ivan G.
    Lorena, Ana C.
    Peres, Liciana R. M. P. y
    de Souto, Marcilio C. P.
    ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, PROCEEDINGS, 2009, 5676 : 48 - +
  • [44] Estimation of Missing Values in Incomplete Industrial Process Data Sets Using ECM Algorithm
    Pirehgalin, Mina Fahimi
    Vogel-Heuser, Birgit
    2018 IEEE 16TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2018, : 245 - 251
  • [45] Inferring gene regulatory networks by an order independent algorithm using incomplete data sets
    Aghdam, Rosa
    Ganjali, Mojtaba
    Niloofar, Parisa
    Eslahchi, Changiz
    JOURNAL OF APPLIED STATISTICS, 2016, 43 (05) : 893 - 913
  • [46] Data mining a keystroke dynamics based biometrics database using rough sets
    Revert, Kenneth
    de Magalhaes, Sergio Tenreiro
    Santos, Henrique
    2005 PORTUGUESE CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, : 188 - 191
  • [47] Data mining based query processing using rough sets and genetic algorithms
    Srinivasa, K. G.
    Jagadish, M.
    Venugopal, K. R.
    Patnaik, L. M.
    2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, VOLS 1 AND 2, 2007, : 275 - 282
  • [48] DATA MINING APPLICATION USING THE ROUGH SETS THEORY TO GENERATE INTELECTUAL CAPITAL
    Dalfovo, Oscar
    Schmitt, Sidnei
    Raboch, Henrique
    INFORMACAO & SOCIEDADE-ESTUDOS, 2010, 20 (01) : 139 - 152
  • [49] Using support vector machines for mining regression classes in large data sets
    Sun, ZH
    Gao, LX
    Sun, YX
    2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 89 - 92
  • [50] On the Comparison of Malware Detection Methods Using Data Mining with Two Feature Sets
    Srakaew, Sathaporn
    Piyanuntcharatsr, Warot
    Adulkasem, Suchitra
    Chantrapornchai, Chantana
    INTERNATIONAL JOURNAL OF SECURITY AND ITS APPLICATIONS, 2015, 9 (03): : 293 - 318