Mining classification rules from datasets with large number of many-valued attributes

被引:0
|
作者
Giuffrida, G [1 ]
Chu, WW
Hanssens, DM
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
[2] Univ Calif Los Angeles, Anderson Grad Sch Management, Los Angeles, CA 90024 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Decision tree induction algorithms scale well to large datasets for their univariate and divide-and-conquer approach. However, they may fail in discovering effective knowledge when the input dataset consists of a large number of uncorrelated many-valued attributes. In this paper we present an algorithm, Noah, that tackles this problem by applying a multivariate search. Performing a multivariate search leads to a much larger consumption of computation time and memory, this may be prohibitive for large datasets. We remedy this problem by exploiting effective pruning strategies and efficient data structures. We applied our algorithm to a real marketing application of cross-selling. Experimental results revealed that the application database was too complex for C4.5 as it failed to discover any useful knowledge. The application database was also too large for various well known rule discovery algorithms which were not able to complete their task. The pruning techniques used in Noah are general in nature and can be used in other mining systems.
引用
收藏
页码:335 / 349
页数:15
相关论文
共 50 条
  • [31] On Complexity of Deterministic and Nondeterministic Decision Trees for Decision Tables with Many-Valued Decisions from Closed Classes
    Ostonov, Azimkhon
    Moshkov, Mikhail
    ROUGH SETS, PT I, IJCRS 2024, 2024, 14839 : 173 - 187
  • [32] Depth of Deterministic and Nondeterministic Decision Trees for Decision Tables with Many-Valued Decisions from Closed Classes
    Ostonov, Azimkhon
    Moshkov, Mikhail
    RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2024, PT I, 2024, 2144 : 164 - 174
  • [33] Mining Breast Cancer Classification Rules from Mammograms
    Yeh, Jinn-Yi
    Chan, Si-Wa
    Wu, Tai-Hsi
    JOURNAL OF INTELLIGENT SYSTEMS, 2016, 25 (01) : 19 - 36
  • [34] Efficient mining of high utility itemsets from large datasets
    Erwin, Alva
    Gopalan, Raj P.
    Achuthan, N. R.
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 554 - +
  • [35] Mining Association Rules from a Single Large Graph
    Huynh, Bao
    Nguyen, Lam B. Q.
    Nguyen, Duc H. M.
    Nguyen, Ngoc Thanh
    Nguyen, Hung-Son
    Pham, Tuyn
    Pham, Tri
    Nguyen, Loan T. T.
    Nguyen, Trinh D. D.
    Vo, Bay
    CYBERNETICS AND SYSTEMS, 2024, 55 (03) : 693 - 707
  • [36] Mining Short Association Rules from Large Database
    Ye, Feiyue
    Chen, Mingxia
    Qian, Jin
    2009 ASIA-PACIFIC CONFERENCE ON INFORMATION PROCESSING (APCIP 2009), VOL 1, PROCEEDINGS, 2009, : 362 - 365
  • [37] CNVineta: a data mining tool for large case-control copy number variation datasets
    Wittig, Michael
    Helbig, Ingo
    Schreiber, Stefan
    Franke, Andre
    BIOINFORMATICS, 2010, 26 (17) : 2208 - 2209
  • [38] Mining critical least association rules from students suffering study anxiety datasets
    Herawan, Tutut
    Chiroma, Haruna
    Vitasari, Prima
    Abdullah, Zailani
    Ismail, Maizatul Akmar
    Othman, Mohd Khalit
    QUALITY & QUANTITY, 2015, 49 (06) : 2527 - 2547
  • [39] Mining critical least association rules from students suffering study anxiety datasets
    Tutut Herawan
    Haruna Chiroma
    Prima Vitasari
    Zailani Abdullah
    Maizatul Akmar Ismail
    Mohd Khalit Othman
    Quality & Quantity, 2015, 49 : 2527 - 2547
  • [40] Association Rule Mining from large datasets of clinical invoices document
    Agapito, Giuseppe
    Calabrese, Barbara
    Guzzi, Pietro Hiram
    Graziano, Sabrina
    Cannataro, Mario
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 2232 - 2238