A heterogeneous computing system for data mining workflows

被引:0
|
作者
Luo, Ping
Lu, Kevin
He, Qing
Shi, Zhongzhi
机构
[1] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100080, Peoples R China
[2] Brunel Univ, Uxbridge UB8 3PH, Middx, England
[3] Chinese Acad Sci, Grad Sch, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The computing-intensive Data Mining (DM) process calls for the support of a Heterogeneous Computing (HC) system, which consists of multiple computers with different configurations, connected by a high-speed LAN, for increased computational power and resources. DM process can be described as a multi-phase pipeline process, and in each phase there could be many optional methods. This makes the workflow of DM very complex and can be modelled only by a Directed Acyclic Graph (DAG). An HC system needs an effective and efficient scheduling framework, which orchestrates all the computing hardware to perform multiple competitive DM workflows. Motivated by the need of a practical solution of the scheduling problem for the DM workflow, this paper proposes a dynamic DAG scheduling algorithm according to the characteristics of execution time estimation model for DM jobs. Based on an approximate estimation of job execution time, this algorithm first maps DM jobs to machines in a decentralized and diligent (defined in this paper) manner. Then the performance of this initial mapping can be improved through job migrations when necessary. The scheduling heuristic used in it considers the factors of both the minimal completion time criterion and the critical path in a DAG. We implement this system in an established Multi-Agent System (MAS) environment, in which the reuse of existing DM algorithms is achieved by encapsulating them into agents. Practical classification problems are used to test and measure the system performance. The detailed experiment procedure and result analysis are also discussed in this paper.
引用
收藏
页码:177 / 189
页数:13
相关论文
共 50 条
  • [41] Prior node selection for scheduling workflows in a heterogeneous system
    Kanemitsu, Hidehiro
    Hanada, Masaki
    Nakazato, Hidenori
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2017, 109 : 155 - 177
  • [42] Orchestrating and Scheduling System for Workflows in Heterogeneous and Dynamic Environment
    Liang, Wenliang
    Lin, Hao
    Shen, Haihua
    Wang, Enbo
    IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS, INFOCOM WKSHPS 2024, 2024,
  • [43] MPO: A System to Document and Analyze Distributed Heterogeneous Workflows
    Wu, Kesheng
    Coviello, Elizabeth N.
    Flanagan, S. M.
    Greenwald, Martin
    Lee, Xia
    Romosan, Alex
    Schissel, David P.
    Shoshani, Arie
    Stillerman, Josh
    Wright, John
    Provenance and Annotation of Data and Processes, IPAW 2016, 2016, 9672 : 166 - 170
  • [44] Intelligent computing system based on pattern recognition and data mining algorithms
    Zhang, Junlin
    Williams, Samuel Oluwarotimi
    Wang, Haoxiang
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2018, 20 : 192 - 202
  • [45] Data Mining and Soft Computing in Business Model for Decision Support System
    Gazzawe, Foziah
    Alturki, Ryan
    SCIENTIFIC PROGRAMMING, 2022, 2022
  • [46] CANFAR plus Skytree: A Cloud Computing and Data Mining System for Astronomy
    Ball, Nicholas M.
    ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XXII, 2013, 475 : 311 - 314
  • [47] Exploring the efficacy of branch and bound strategy for scheduling workflows on heterogeneous computing systems
    Sirisha, D.
    Vijayakumari, G.
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING AND COMMUNICATIONS, 2016, 93 : 315 - 323
  • [48] CANFAR plus Skytree: A Cloud Computing and Data Mining System for Astronomy
    Ball, Nicholas M.
    ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XXII, 2013, 475 : 391 - 394
  • [49] MPEFT: a makespan minimizing heuristic scheduling algorithm for workflows in heterogeneous computing systems
    D. Sirisha
    S. Sambhu Prasad
    CCF Transactions on High Performance Computing, 2023, 5 : 374 - 389
  • [50] MPEFT: a makespan minimizing heuristic scheduling algorithm for workflows in heterogeneous computing systems
    Sirisha, D.
    Prasad, S. Sambhu
    CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2023, 5 (04) : 374 - 389