Complexity of rule sets in mining incomplete data using characteristic sets and generalized maximal consistent blocks

被引：3

作者：

Clark, Patrick G. ^{[1
]}

Gao, Cheng ^{[1
]}

Grzymala-Busse, Jerzy W. ^{[1
,2
]}

Mroczek, Teresa ^{[2
]}

Niemiec, Rafal ^{[2
]}

机构：

[1] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA

[2] Univ Informat Technol & Management, Dept Expert Syst & Artificial Intelligence, PL-35225 Rzeszow, Poland

来源：

LOGIC JOURNAL OF THE IGPL | 2021年 / 29卷 / 02期

关键词：

Incomplete data; characteristic sets; maximal consistent blocks; MLEM2 rule induction algorithm; probabilistic approximations;

D O I：

10.1093/jigpal/jzaa041

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

In this paper, missing attribute values in incomplete data sets have three possible interpretations: lost values, attribute-concept values and 'do not care' conditions. For rule induction, we use characteristic sets and generalized maximal consistent blocks. Therefore, we apply six different approaches for data mining. As follows from our previous experiments, where we used an error rate evaluated by ten-fold cross validation as the main criterion of quality, no approach is universally the best. Thus, we decided to compare our six approaches using complexity of rule sets induced from incomplete data sets. We show that the smallest rule sets are induced from incomplete data sets with attribute-concept values, while the most complicated rule sets are induced from data sets with lost values. The choice between interpretations of missing attribute values is more important than the choice between characteristic sets and generalized maximal consistent blocks.

引用

页码：124 / 137

页数：14

共 50 条

[11] On Measuring Inconsistency Using Maximal Consistent Sets
Ammoura, Meriem
Raddaoui, Badran
Salhi, Yakoub
Oukacha, Brahim
SYMBOLIC AND QUANTITATIVE APPROACHES TO REASONING WITH UNCERTAINTY, ECSQARU 2015, 2015, 9161 : 267 - 276
[12] On generalized quantifiers, finite sets and data mining
Hájek, P
INTELLIGENT INFORMATION PROCESSING AND WEB MINING, 2003, : 489 - 496
[13] Multicost Decision-Theoretic Rough Sets Based on Maximal Consistent Blocks
Ma, Xingbin
Yang, Xibei
Qi, Yong
Song, Xiaoning
Yang, Jingyu
ROUGH SETS AND KNOWLEDGE TECHNOLOGY, RSKT 2014, 2014, 8818 : 824 - 833
[14] A Comparison of Four Classification Systems Using Rule Sets Induced from Incomplete Data Sets by Local Probabilistic Approximations
Clark, Patrick G.
Gao, Cheng
Grzymala-Busse, Jerzy W.
FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 282 - 291
[15] On the use of conceptual reconstruction for mining massively incomplete data sets
Parthasarathy, S
Aggarwal, CC
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (06) : 1512 - 1521
[16] Mining from incomplete quantitative data by fuzzy rough sets
Hong, Tzung-Pei
Tseng, Li-Huei
Chien, Been-Chian
EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (03) : 2644 - 2653
[17] Rule Set Complexity in Mining Incomplete Data Using Global and Saturated Probabilistic Approximations
Clark, Patrick G.
Grzymala-Busse, Jerzy W.
Mroczek, Teresa
Niemiec, Rafal
INFORMATION AND SOFTWARE TECHNOLOGIES, ICIST 2019, 2019, 1078 : 451 - 462
[18] Rule Set Complexity for Incomplete Data Sets with Many Attribute-Concept Values and "Do Not Care" Conditions
Clark, Patrick G.
Gao, Cheng
Grzymala-Busse, Jerzy W.
ROUGH SETS, (IJCRS 2016), 2016, 9920 : 65 - 74
[19] An Analysis of Probabilistic Approximations for Rule Induction from Incomplete Data Sets
Clark, Patrick G.
Grzymala-Busse, JerzyW.
Hippe, Zdzislaw S.
FUNDAMENTA INFORMATICAE, 2014, 132 (03) : 365 - 379
[20] Application of self-organising maps for data mining with incomplete data sets
S. Wang
Neural Computing & Applications, 2003, 12 : 42 - 48

← 1 2 3 4 5 →