Mining classification rules from datasets with large number of many-valued attributes

被引:0
|
作者
Giuffrida, G [1 ]
Chu, WW
Hanssens, DM
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
[2] Univ Calif Los Angeles, Anderson Grad Sch Management, Los Angeles, CA 90024 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Decision tree induction algorithms scale well to large datasets for their univariate and divide-and-conquer approach. However, they may fail in discovering effective knowledge when the input dataset consists of a large number of uncorrelated many-valued attributes. In this paper we present an algorithm, Noah, that tackles this problem by applying a multivariate search. Performing a multivariate search leads to a much larger consumption of computation time and memory, this may be prohibitive for large datasets. We remedy this problem by exploiting effective pruning strategies and efficient data structures. We applied our algorithm to a real marketing application of cross-selling. Experimental results revealed that the application database was too complex for C4.5 as it failed to discover any useful knowledge. The application database was also too large for various well known rule discovery algorithms which were not able to complete their task. The pruning techniques used in Noah are general in nature and can be used in other mining systems.
引用
收藏
页码:335 / 349
页数:15
相关论文
共 50 条
  • [41] Interactive Rare-Category-of-Interest Mining from Large Datasets
    Liu, Zhenguang
    Hu, Sihao
    Yin, Yifang
    Chen, Jianhai
    Chiew, Kevin
    Zhang, Luming
    Wu, Zetian
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 4965 - 4972
  • [42] Mining positive and negative association rules from large databases
    Cornelis, Chris
    Yan, Peng
    Zhang, Xing
    Chen, Guoqing
    2006 IEEE CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2006, : 152 - 157
  • [43] Mining Rules with Constants from Large Scale Knowledge Bases
    Wang, Xuan
    Zhang, Jingjing
    Chen, Jinchuan
    Fan, Ju
    CONCEPTUAL MODELING, ER 2018, 2018, 11157 : 521 - 535
  • [44] Learning rules from large datasets using rough set and Apriori algorithm
    Guo, Sen
    Wang, Zhi-Yan
    Zhang, Yang-Qing
    Yan, He-Ping
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 1178 - +
  • [45] Mining association rules from databases with continuous attributes using Genetic Network Programming
    Taboada, Karla
    Gonzales, Eloy
    Shimada, Kaoru
    Mabu, Shingo
    Hirasawa, Kotaro
    Hu, Jinglu
    2007 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-10, PROCEEDINGS, 2007, : 1311 - 1317
  • [46] Efficient mining of long frequent patterns from very large dense datasets
    Gopalan, RP
    Sucahyo, YG
    DESIGN AND APPLICATION OF HYBRID INTELLIGENT SYSTEMS, 2003, 104 : 652 - 661
  • [47] Parallel mining of OWL 2 EL ontology from large linked datasets
    Li, Huiying
    Sima, Qiang
    KNOWLEDGE-BASED SYSTEMS, 2015, 84 : 10 - 17
  • [48] Distributed algorithm for mining multilevel association rules from large databases
    Wang, Chunhua
    Huang, Houkuan
    Tian, Shengfeng
    Wang, Zhihai
    Tiedao Xuebao/Journal of the China Railway Society, 2000, 22 (05): : 47 - 50
  • [49] MiSeRe-Hadoop: A Large-Scale Robust Sequential Classification Rules Mining Framework
    Egho, Elias
    Gay, Dominique
    Trinquart, Romain
    Boulle, Marc
    Voisine, Nicolas
    Clerot, Fabrice
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2017, 2017, 10440 : 105 - 119
  • [50] A novel method for mining association rules from continuous attributes based on cultural immune algorithm
    Yang, Guangjun
    Journal of Information and Computational Science, 2013, 10 (09): : 2845 - 2853