Mining classification rules from datasets with large number of many-valued attributes

被引：0

作者：

Giuffrida, G ^{[1
]}

Chu, WW

Hanssens, DM

机构：

[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA

[2] Univ Calif Los Angeles, Anderson Grad Sch Management, Los Angeles, CA 90024 USA

来源：

ADVANCES IN DATABSE TECHNOLOGY-EDBT 2000, PROCEEDINGS | 2000年 / 1777卷

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Decision tree induction algorithms scale well to large datasets for their univariate and divide-and-conquer approach. However, they may fail in discovering effective knowledge when the input dataset consists of a large number of uncorrelated many-valued attributes. In this paper we present an algorithm, Noah, that tackles this problem by applying a multivariate search. Performing a multivariate search leads to a much larger consumption of computation time and memory, this may be prohibitive for large datasets. We remedy this problem by exploiting effective pruning strategies and efficient data structures. We applied our algorithm to a real marketing application of cross-selling. Experimental results revealed that the application database was too complex for C4.5 as it failed to discover any useful knowledge. The application database was also too large for various well known rule discovery algorithms which were not able to complete their task. The pruning techniques used in Noah are general in nature and can be used in other mining systems.

引用

页码：335 / 349

页数：15

共 50 条

[21] Iterative social consolidations: forming beliefs from many-valued evidence and peers' opinions
Santos, Yuri David
Kooi, Barteld
Verbrugge, Rineke
JOURNAL OF LOGIC AND COMPUTATION, 2022, 32 (06) : 1142 - 1161
[22] Similarity reasoning in formal concept analysis: from one- to many-valued contexts
Formica, Anna
KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 60 (02) : 715 - 739
[23] Mining Competitors from Large Unstructured Datasets
Valkanas, George
Lappas, Theodoros
Gunopulos, Dimitrios
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (09) : 1971 - 1984
[24] BOUNDS OF MAXIMUM NUMBER OF TERMS REQUIRED TO COMPOSE ARBITRARY MANY-VALUED 1-PLACE FUNCTIONS WITH MINIMUM NUMBER OF BASIC FUNCTIONS
MINE, H
KOGA, Y
TOKUYAMA, H
ELECTRONICS & COMMUNICATIONS IN JAPAN, 1966, 49 (10): : 36 - &
[25] Mining fuzzy association rules from heterogeneous probabilistic datasets
Pei, Bin
Zhao, Tingting
Zhao, Suyun
Chen, Hong
2012 IEEE 24TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2012), VOL 1, 2012, : 828 - 835
[26] Deterministic and Nondeterministic Decision Trees for Decision Tables with Many-Valued Decisions from Closed Classes
Ostonov, Azimkhon
Moshkov, Mikhail
ROUGH SETS, IJCRS 2023, 2023, 14481 : 89 - 104
[27] Effect of similar behaving attributes in mining of fuzzy association rules in the large databases
Farzanyar, Zahra
Kangavari, Mohammadreza
Hashemi, Sattar
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2006, PT 1, 2006, 3980 : 1100 - 1109
[28] Multistep classification based on atomic and associative rules in the large-scale datasets
School of Computer Science and Engineering, South China University of Technology, Guangzhou 510640, China
不详
Kong Zhi Li Lun Yu Ying Yong, 2007, 3 (471-474):
[29] Mining Flipping Correlations from Large Datasets with Taxonomies
Barsky, Marina
Kim, Sangkyum
Weninger, Tim
Han, Jiawei
PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 5 (04): : 370 - 381
[30] ScalParC: A new scalable and efficient parallel classification algorithm for mining large datasets
Joshi, MV
Karypis, G
Kumar, V
FIRST MERGED INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM & SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, 1998, : 573 - 579

← 1 2 3 4 5 →