Mining classification rules from datasets with large number of many-valued attributes

被引:0
|
作者
Giuffrida, G [1 ]
Chu, WW
Hanssens, DM
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
[2] Univ Calif Los Angeles, Anderson Grad Sch Management, Los Angeles, CA 90024 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Decision tree induction algorithms scale well to large datasets for their univariate and divide-and-conquer approach. However, they may fail in discovering effective knowledge when the input dataset consists of a large number of uncorrelated many-valued attributes. In this paper we present an algorithm, Noah, that tackles this problem by applying a multivariate search. Performing a multivariate search leads to a much larger consumption of computation time and memory, this may be prohibitive for large datasets. We remedy this problem by exploiting effective pruning strategies and efficient data structures. We applied our algorithm to a real marketing application of cross-selling. Experimental results revealed that the application database was too complex for C4.5 as it failed to discover any useful knowledge. The application database was also too large for various well known rule discovery algorithms which were not able to complete their task. The pruning techniques used in Noah are general in nature and can be used in other mining systems.
引用
收藏
页码:335 / 349
页数:15
相关论文
共 50 条
  • [21] Iterative social consolidations: forming beliefs from many-valued evidence and peers' opinions
    Santos, Yuri David
    Kooi, Barteld
    Verbrugge, Rineke
    JOURNAL OF LOGIC AND COMPUTATION, 2022, 32 (06) : 1142 - 1161
  • [22] Similarity reasoning in formal concept analysis: from one- to many-valued contexts
    Formica, Anna
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 60 (02) : 715 - 739
  • [23] Mining Competitors from Large Unstructured Datasets
    Valkanas, George
    Lappas, Theodoros
    Gunopulos, Dimitrios
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (09) : 1971 - 1984
  • [24] BOUNDS OF MAXIMUM NUMBER OF TERMS REQUIRED TO COMPOSE ARBITRARY MANY-VALUED 1-PLACE FUNCTIONS WITH MINIMUM NUMBER OF BASIC FUNCTIONS
    MINE, H
    KOGA, Y
    TOKUYAMA, H
    ELECTRONICS & COMMUNICATIONS IN JAPAN, 1966, 49 (10): : 36 - &
  • [25] Mining fuzzy association rules from heterogeneous probabilistic datasets
    Pei, Bin
    Zhao, Tingting
    Zhao, Suyun
    Chen, Hong
    2012 IEEE 24TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2012), VOL 1, 2012, : 828 - 835
  • [26] Deterministic and Nondeterministic Decision Trees for Decision Tables with Many-Valued Decisions from Closed Classes
    Ostonov, Azimkhon
    Moshkov, Mikhail
    ROUGH SETS, IJCRS 2023, 2023, 14481 : 89 - 104
  • [27] Effect of similar behaving attributes in mining of fuzzy association rules in the large databases
    Farzanyar, Zahra
    Kangavari, Mohammadreza
    Hashemi, Sattar
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2006, PT 1, 2006, 3980 : 1100 - 1109
  • [28] Multistep classification based on atomic and associative rules in the large-scale datasets
    School of Computer Science and Engineering, South China University of Technology, Guangzhou 510640, China
    不详
    Kong Zhi Li Lun Yu Ying Yong, 2007, 3 (471-474):
  • [29] Mining Flipping Correlations from Large Datasets with Taxonomies
    Barsky, Marina
    Kim, Sangkyum
    Weninger, Tim
    Han, Jiawei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 5 (04): : 370 - 381
  • [30] ScalParC: A new scalable and efficient parallel classification algorithm for mining large datasets
    Joshi, MV
    Karypis, G
    Kumar, V
    FIRST MERGED INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM & SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, 1998, : 573 - 579