Data Distribution and Scheduling for Distributed Analytics Tasks

被引:0
|
作者
Pasteris, Stephen [1 ]
Wang, Shiqiang [2 ]
Makaya, Christian [2 ]
Chan, Kevin [3 ]
Herbster, Mark [1 ]
机构
[1] UCL, Dept Comp Sci, London, England
[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA
[3] US Army, Res Lab, Adelphi, MD USA
关键词
Data placement; Internet of Things (IoT); maximum flow problem; mobile edge computing; optimization; FLOW; ALGORITHM;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider a distributed edge computing system where we have a number of interconnected machines with limited communication bandwidth and storage capacity. Analytics tasks run on the machines, where each task runs on a single machine but may require data from multiple other machines. Every task requires a given amount of data to run, and it needs to receive all its data within a specific deadline. The application scenario is that each machine has limited storage, thus we usually cannot place the entire amount of data for a specific task on a single machine that executes the task. We assume that the task execution is sparse in time, so that at most one task is executed in the system at any time. The problem we study in this paper is how to distribute the data on machines in the system, without violating the bandwidth and storage constraints, while ensuring that the data transfer deadlines are met. We prove that the optimal solution to this problem is equivalent to that of a max-flow problem on a specifically constructed graph. We present how to construct this graph so that the problem can be solved using standard algorithms for max-flow problems, and also provide some numerical results and further discussions.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Scheduling independent tasks sharing large data distributed with BitTorrent
    Wei, BH
    Fedak, G
    Cappello, F
    2005 6TH INTERNATIONAL WORKSHOP ON GRID COMPUTING (GRID), 2005, : 219 - 226
  • [2] DALEOS: Distributed scheduling for earth observation Data Analytics in LEO Satellites
    Biswas, Swagata
    Paul, Himadri Sekhar
    2024 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS, PERCOM WORKSHOPS, 2024, : 209 - 214
  • [3] A Scheduling Framework for Periodic Tasks in Geo-Distributed Data Centers
    Li, Yan
    Zhang, Hong
    Wang, Yong
    Liu, Xinran
    Zhang, Peng
    9TH IEEE INTERNATIONAL SYMPOSIUM ON SERVICE-ORIENTED SYSTEM ENGINEERING (SOSE 2015), 2015, : 247 - 252
  • [4] Optimizing Geo-Distributed Data Analytics with Coordinated Task Scheduling and Routing
    Zhao, Laiping
    Yang, Yanan
    Munir, Ali
    Liu, Alex X.
    Li, Yue
    Qu, Wenyu
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (02) : 279 - 293
  • [5] Scheduling of distributed tasks for survivability of the application
    Chabridon, S
    Gelenbe, E
    INFORMATION SCIENCES, 1997, 97 (1-2) : 179 - 198
  • [6] A Novel Framework for Scheduling Distributed Tasks
    Noguero, Adrian
    Calvo, Isidro
    Perez, Federico
    2009 IEEE CONFERENCE ON EMERGING TECHNOLOGIES & FACTORY AUTOMATION (EFTA 2009), 2009,
  • [7] Flutter: Scheduling Tasks Closer to Data Across Geo-Distributed Datacenters
    Hu, Zhiming
    Li, Baochun
    Luo, Jun
    IEEE INFOCOM 2016 - THE 35TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS, 2016,
  • [8] Energy-aware Grid Scheduling of Independent Tasks and Highly Distributed Data
    Kolodziej, Joanna
    Szmajduch, Magdalena
    Maqsood, Tahir
    Madani, Sajjad A.
    Min-Allah, Nasro
    Khan, Samee U.
    2013 11TH INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY (FIT), 2013, : 211 - 216
  • [9] ALGORITHMS FOR DYNAMIC SCHEDULING OF TASKS IN A DISTRIBUTED SYSTEM
    BUBNOV, VP
    TOROPOV, VN
    AVTOMATIKA I VYCHISLITELNAYA TEKHNIKA, 1990, (06): : 14 - 17
  • [10] Scheduling tasks and communications on a virtual distributed system
    Colin, JY
    Colin, P
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1996, 94 (02) : 271 - 276