Complexity of rule sets in mining incomplete data using characteristic sets and generalized maximal consistent blocks

被引：3

作者：

Clark, Patrick G. ^{[1
]}

Gao, Cheng ^{[1
]}

Grzymala-Busse, Jerzy W. ^{[1
,2
]}

Mroczek, Teresa ^{[2
]}

Niemiec, Rafal ^{[2
]}

机构：

[1] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA

[2] Univ Informat Technol & Management, Dept Expert Syst & Artificial Intelligence, PL-35225 Rzeszow, Poland

来源：

LOGIC JOURNAL OF THE IGPL | 2021年 / 29卷 / 02期

关键词：

Incomplete data; characteristic sets; maximal consistent blocks; MLEM2 rule induction algorithm; probabilistic approximations;

D O I：

10.1093/jigpal/jzaa041

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

In this paper, missing attribute values in incomplete data sets have three possible interpretations: lost values, attribute-concept values and 'do not care' conditions. For rule induction, we use characteristic sets and generalized maximal consistent blocks. Therefore, we apply six different approaches for data mining. As follows from our previous experiments, where we used an error rate evaluated by ten-fold cross validation as the main criterion of quality, no approach is universally the best. Thus, we decided to compare our six approaches using complexity of rule sets induced from incomplete data sets. We show that the smallest rule sets are induced from incomplete data sets with attribute-concept values, while the most complicated rule sets are induced from data sets with lost values. The choice between interpretations of missing attribute values is more important than the choice between characteristic sets and generalized maximal consistent blocks.

引用

页码：124 / 137

页数：14

共 50 条

[21] Application of self-organising maps for data mining with incomplete data sets
Wang, SH
NEURAL COMPUTING & APPLICATIONS, 2003, 12 (01): : 42 - 48
[22] Complexity of Rule Sets Induced from Data Sets with Many Lost and Attribute-Concept Values
Clark, Patrick G.
Gao, Cheng
Grzymala-Busse, Jerzy W.
ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, (ICAISC 2016), PT II, 2016, 9693 : 27 - 36
[23] Generalized zone separation functionals for convex perfect forms and incomplete data sets
Kaiser, MJ
INTERNATIONAL JOURNAL OF MACHINE TOOLS & MANUFACTURE, 1998, 38 (04): : 375 - 404
[24] A Generalized MapReduce Approach for Efficient mining of Large data Sets in the GRID
Roehm, Matthias
Grabert, Matthias
Schweiggert, Franz
PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, GRIDS, AND VIRTUALIZATION (CLOUD COMPUTING 2010), 2010, : 14 - 19
[25] Parallel Distributed Genetic Rule Selection for Data Mining from Large Data Sets
Nojima, Yusuke
Mihara, Shingo
Ishibuchi, Hisao
SIMULATION AND MODELING RELATED TO COMPUTATIONAL SCIENCE AND ROBOTICS TECHNOLOGY, 2012, 37 : 140 - 154
[26] Reconstruction of Incomplete Data Sets or Images Using Direct Sampling
Gregoire Mariethoz
Philippe Renard
Mathematical Geosciences, 2010, 42 : 245 - 268
[27] Reconstruction of Incomplete Data Sets or Images Using Direct Sampling
Mariethoz, Gregoire
Renard, Philippe
MATHEMATICAL GEOSCIENCES, 2010, 42 (03) : 245 - 268
[28] Extended Association Rule Mining and Its Application to Software Engineering Data Sets
Saito, Hidekazu
Nishiura, Kinari
Monden, Akito
Morisaki, Shuji
INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2024, 34 (11) : 1735 - 1756
[29] Global and componentwise extrapolation for accelerating data mining from large incomplete data sets with the EM algorithm
Hsu, Chun-Nan
Huang, Han-Shen
Yang, Bo-Hou
ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 265 - +
[30] A technique for preparing thermodynamic property tables using incomplete data sets
Duarte-Garza, H
Holste, JC
Hall, KR
Iglesias-Silva, GA
FLUID PHASE EQUILIBRIA, 1998, 146 (1-2) : 123 - 137

← 1 2 3 4 5 →