High-Performance Commercial Data Mining: A Multistrategy Machine Learning Application

被引:0
|
作者
William H. Hsu
Michael Welge
Tom Redman
David Clutter
机构
[1] Kansas State University,Department of Computing and Information Sciences
[2] National Center for Supercomputing Applications (NCSA),Automated Learning Group
来源
关键词
constructive induction; scalable high-performance computing; real-world decision support applications; relevance determination; genetic algorithms; software development environments for knowledge discovery in databases (KDD);
D O I
暂无
中图分类号
学科分类号
摘要
We present an application of inductive concept learning and interactive visualization techniques to a large-scale commercial data mining project. This paper focuses on design and configuration of high-level optimization systems (wrappers) for relevance determination and constructive induction, and on integrating these wrappers with elicited knowledge on attribute relevance and synthesis. In particular, we discuss decision support issues for the application (cost prediction for automobile insurance markets in several states) and report experiments using D2K, a Java-based visual programming system for data mining and information visualization, and several commercial and research tools. We describe exploratory clustering, descriptive statistics, and supervised decision tree learning in this application, focusing on a parallel genetic algorithm (GA) system, Jenesis, which is used to implement relevance determination (attribute subset selection). Deployed on several high-performance network-of-workstation systems (Beowulf clusters), Jenesis achieves a linear speedup, due to a high degree of task parallelism. Its test set accuracy is significantly higher than that of decision tree inducers alone and is comparable to that of the best extant search-space based wrappers.
引用
收藏
页码:361 / 391
页数:30
相关论文
共 50 条
  • [41] Fuzzy machine learning and data mining
    Huellermeier, Eyke
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 1 (04) : 269 - 283
  • [42] Machine learning for data mining in medicine
    Lavrac, N
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 1999, 1620 : 47 - 62
  • [43] High-Performance Concrete Strength Prediction Based on Machine Learning
    Liu, Yanning
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [44] Litz: Elastic Framework for High-Performance Distributed Machine Learning
    Qiao, Aurick
    Aghayev, Abutalib
    Yu, Weiren
    Chen, Haoyang
    Ho, Qirong
    Gibson, Garth A.
    Xing, Eric P.
    PROCEEDINGS OF THE 2018 USENIX ANNUAL TECHNICAL CONFERENCE, 2018, : 631 - 643
  • [45] Applications of artificial intelligence/machine learning to high-performance composites
    Wang, Yifeng
    Wang, Kan
    Zhang, Chuck
    COMPOSITES PART B-ENGINEERING, 2024, 285
  • [46] Automatic Generation of High-Performance Quantized Machine Learning Kernels
    Cowan, Meghan
    Moreau, Thierry
    Chen, Tianqi
    Bornholt, James
    Ceze, Luis
    CGO'20: PROCEEDINGS OF THE18TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2020, : 305 - 316
  • [47] High-Performance Concrete Strength Prediction Based on Machine Learning
    Liu, Yanning
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [48] Sampled Dense Matrix Multiplication for High-Performance Machine Learning
    Nisa, Israt
    Sukumaran-Rajam, Aravind
    Kurt, Sureyya Emre
    Hong, Changwan
    Sadayappan, P.
    2018 IEEE 25TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2018, : 32 - 41
  • [49] High-Performance Visual Tracking With Extreme Learning Machine Framework
    Deng, Chenwei
    Han, Yuqi
    Zhao, Baojun
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (06) : 2781 - 2792
  • [50] Machine Learning with Graphs in High-Performance Computing Environments (MLGHPCE)
    Lim, Seung-Hwan
    Schuman, Catherine D.
    Vuduc, Richard
    Moreira, Jose
    ACM International Conference Proceeding Series, 2023,