Data Distribution and Scheduling for Distributed Analytics Tasks

被引:0
|
作者
Pasteris, Stephen [1 ]
Wang, Shiqiang [2 ]
Makaya, Christian [2 ]
Chan, Kevin [3 ]
Herbster, Mark [1 ]
机构
[1] UCL, Dept Comp Sci, London, England
[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA
[3] US Army, Res Lab, Adelphi, MD USA
关键词
Data placement; Internet of Things (IoT); maximum flow problem; mobile edge computing; optimization; FLOW; ALGORITHM;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider a distributed edge computing system where we have a number of interconnected machines with limited communication bandwidth and storage capacity. Analytics tasks run on the machines, where each task runs on a single machine but may require data from multiple other machines. Every task requires a given amount of data to run, and it needs to receive all its data within a specific deadline. The application scenario is that each machine has limited storage, thus we usually cannot place the entire amount of data for a specific task on a single machine that executes the task. We assume that the task execution is sparse in time, so that at most one task is executed in the system at any time. The problem we study in this paper is how to distribute the data on machines in the system, without violating the bandwidth and storage constraints, while ensuring that the data transfer deadlines are met. We prove that the optimal solution to this problem is equivalent to that of a max-flow problem on a specifically constructed graph. We present how to construct this graph so that the problem can be solved using standard algorithms for max-flow problems, and also provide some numerical results and further discussions.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Geographical Scheduling of Multi-Application Tasks for Cost Minimization in Distributed Green Data Centers
    Yuan, Haitao
    Bi, Jing
    Zhou, MengChu
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 3171 - 3176
  • [22] Profit-Aware Distributed Online Scheduling for Data-Oriented Tasks in Cloud Datacenters
    Lu, Wei
    Lu, Ping
    Sun, Quanying
    Yu, Shui
    Zhu, Zuqing
    IEEE ACCESS, 2018, 6 : 15629 - 15642
  • [23] Scheduling tasks sharing files from distributed repositories
    Giersch, A
    Robert, Y
    Vivien, R
    EURO-PAR 2004 PARALLEL PROCESSING, PROCEEDINGS, 2004, 3149 : 246 - 253
  • [24] Balanced scheduling of distributed workflow tasks based on clustering
    Yu, Dongjin
    Ying, Yuke
    Zhang, Lei
    Liu, Chengfei
    Sun, Xiaoxiao
    Zheng, Hongsheng
    KNOWLEDGE-BASED SYSTEMS, 2020, 199
  • [25] MILP of multitask scheduling of geographically distributed maintenance tasks
    Allaham, Hamed
    Dalalah, Doraid
    INTERNATIONAL JOURNAL OF INDUSTRIAL ENGINEERING COMPUTATIONS, 2022, 13 (01) : 119 - 134
  • [26] Scheduling of Distributed Collaborative Tasks on NDN based MANET
    Tan, Xiaobin
    Jin, Yang
    Feng, Weiwei
    Wang, Shunyi
    Yang, Yubin
    PROCEEDINGS OF THE 2019 ACM SIGCOMM WORKSHOP ON MOBILE AIRGROUND EDGE COMPUTING, SYSTEMS, NETWORKS, AND APPLICATIONS (MAGESYS '19), 2019, : 36 - 42
  • [27] Jitterless tasks scheduling algorithm for distributed multimedia systems
    Univ of Science and Technology of, China, Beijing, China
    Jisuanji Xuebao, 1 (24-30):
  • [28] Tasks scheduling and resource allocation in distributed cloud environments
    Uskenbayeva, R. K.
    Kuandykov, A. A.
    Cho, Y., I
    Kalpeyeva, Zh. B.
    2014 14TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2014), 2014, : 1373 - 1376
  • [29] Octopus: Based on Congestion-aware Scheduling on Geo-distributed Big Data Analytics Cluster
    Du, Haizhou
    Zhang, Keke
    Yang, Zhenchen
    2018 5TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2018, : 490 - 495
  • [30] The Data Swarm: A Next Step for Distributed Data Analytics
    Smith, Jeffrey
    Rege, Manjeet
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2016, 6 (01) : 52 - 64