A heterogeneous computing system for data mining workflows

被引:0
|
作者
Luo, Ping
Lu, Kevin
He, Qing
Shi, Zhongzhi
机构
[1] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100080, Peoples R China
[2] Brunel Univ, Uxbridge UB8 3PH, Middx, England
[3] Chinese Acad Sci, Grad Sch, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The computing-intensive Data Mining (DM) process calls for the support of a Heterogeneous Computing (HC) system, which consists of multiple computers with different configurations, connected by a high-speed LAN, for increased computational power and resources. DM process can be described as a multi-phase pipeline process, and in each phase there could be many optional methods. This makes the workflow of DM very complex and can be modelled only by a Directed Acyclic Graph (DAG). An HC system needs an effective and efficient scheduling framework, which orchestrates all the computing hardware to perform multiple competitive DM workflows. Motivated by the need of a practical solution of the scheduling problem for the DM workflow, this paper proposes a dynamic DAG scheduling algorithm according to the characteristics of execution time estimation model for DM jobs. Based on an approximate estimation of job execution time, this algorithm first maps DM jobs to machines in a decentralized and diligent (defined in this paper) manner. Then the performance of this initial mapping can be improved through job migrations when necessary. The scheduling heuristic used in it considers the factors of both the minimal completion time criterion and the critical path in a DAG. We implement this system in an established Multi-Agent System (MAS) environment, in which the reuse of existing DM algorithms is achieved by encapsulating them into agents. Practical classification problems are used to test and measure the system performance. The detailed experiment procedure and result analysis are also discussed in this paper.
引用
收藏
页码:177 / 189
页数:13
相关论文
共 50 条
  • [31] Data mining across heterogeneous data
    Ochsenbein, F
    Ortiz, PF
    MINING THE SKY, 2001, : 664 - 670
  • [32] Confuga: Scalable Data Intensive Computing for POSIX Workflows
    Donnelly, Patrick
    Hazekamp, Nicholas
    Thain, Douglas
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 392 - 401
  • [33] Construction of multi tier distributed computing data mining system in cloud computing environment
    Xia Wendong
    Liu Yuanfeng
    Chen Deli
    PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON MATERIALS SCIENCE, MACHINERY AND ENERGY ENGINEERING (MSMEE 2017), 2017, 123 : 1664 - 1667
  • [34] Utilizing heterogeneous data sources in computational Grid workflows
    Kiss, Tamas
    Tudose, Alexandru
    Terstyanszky, Gabor
    MAKING GRIDS WORK, 2008, : 225 - 236
  • [35] Fuzzy computing for data mining
    Hirota, K
    Pedrycz, W
    PROCEEDINGS OF THE IEEE, 1999, 87 (09) : 1575 - 1600
  • [36] Data mining and soft computing
    Kuznetsov, Sergei
    Slezak, Dominik
    INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2013, 42 (06) : 543 - 545
  • [37] Data mining and soft computing
    Ciftcioglu, O
    DATA MINING III, 2002, 6 : 3 - 12
  • [38] Soft computing data mining
    Pal, SK
    Ghosh, A
    INFORMATION SCIENCES, 2004, 163 (1-3) : 1 - 3
  • [39] Granular computing for data mining
    Yao, Yiyu
    DATA MINING, INTRUSION DETECTION, INFORMATION ASSURANCE, AND DATA NETWORKS SECURITY 2006, 2006, 6241
  • [40] Data Optimised Computing for Heterogeneous Big Data Computing Applications
    Yang, Erica
    Ross, Derek
    Nagella, Srikanth
    Turner, Martin
    Kockelmann, Winfried
    Burca, Genoveva
    Pouzols, Federico Montesino
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2817 - 2819