Top 10 algorithms in data mining

被引:3580
作者
Wu, Xindong [1 ]
Kumar, Vipin [2 ]
Quinlan, J. Ross [3 ]
Ghosh, Joydeep [4 ]
Yang, Qiang [5 ]
Motoda, Hiroshi [6 ]
McLachlan, Geoffrey J. [7 ]
Ng, Angus [8 ]
Liu, Bing [9 ]
Yu, Philip S. [10 ]
Zhou, Zhi-Hua [11 ]
Steinbach, Michael [12 ]
Hand, David J. [13 ]
Steinberg, Dan [14 ]
机构
[1] Univ Vermont, Dept Comp Sci, Burlington, VT USA
[2] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN USA
[3] Rulequest Res pty Ltd, St Ives, NSW, Australia
[4] Univ Texas Austin, Dept Elect & Comp Engn, Austin, TX 78712 USA
[5] Hong Kong Univ Sci & Technol, Dept Comp Sci, Hong Kong, Peoples R China
[6] Osaka Univ, AFORS AOARD, Tokyo 10600326, Japan
[7] Univ Queensland, Dept Math, Brisbane, Qld, Australia
[8] Griffith Univ, Sch Med, Brisbane, Qld, Australia
[9] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
[10] IBM TJ Watson Res Ctr, Hawthorne, NY 10532 USA
[11] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210008, Peoples R China
[12] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
[13] Imperial Coll, Dept Math, London, England
[14] Maxwell Labs Inc, Salford Syst, San Diego, CA 92123 USA
关键词
D O I
10.1007/s10115-007-0114-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, association analysis, and link mining, which are all among the most important topics in data mining research and development.
引用
收藏
页码:1 / 37
页数:37
相关论文
共 92 条
[1]  
Agrawal R., 1994, Proceedings of the 20th International Conference on Very Large Data Bases. VLDB'94, P487
[2]   Tree-based partitioning of data for association rule mining [J].
Ahmed, Shakil ;
Coenen, Frans ;
Leng, Paul .
KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 10 (03) :315-331
[3]  
[Anonymous], P 1998 ACM SIGMOD IN
[4]  
[Anonymous], 2000, P INT KDD WORKSH TEX
[5]  
[Anonymous], 2000, Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS'00)
[6]  
Banerjee A, 2005, J MACH LEARN RES, V6, P1705
[7]   GENERALIZED K NEAREST NEIGHBOR RULES [J].
BEZDEK, JC ;
CHUAH, SK ;
LEEP, D .
FUZZY SETS AND SYSTEMS, 1986, 18 (03) :237-256
[8]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[9]   Risk estimation for classification trees [J].
Bloch, DA ;
Olshen, RA ;
Walker, MG .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2002, 11 (02) :263-288
[10]   On condensed representations of constrained frequent patterns [J].
Bonchi, F ;
Lucchese, C .
KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (02) :180-201