Efficient Multi-site Data Movement Using Constraint Programming for Data Hungry Science

被引:1
|
作者
Zerola, Michal [1 ]
Lauret, Jerome [2 ]
Bartak, Roman [3 ]
Sumbera, Michal [1 ]
机构
[1] Acad Sci Czech Republic, Inst Nucl Phys, Prague, Czech Republic
[2] Brookhaven Natl Lab, Upton, NY USA
[3] Charles Univ Prague, Fac Math & Phys, CR-11636 Prague 1, Czech Republic
来源
17TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP09) | 2010年 / 219卷
关键词
D O I
10.1088/1742-6596/219/6/062069
中图分类号
O57 [原子核物理学、高能物理学];
学科分类号
070202 ;
摘要
For the past decade, HENP experiments have been heading towards a distributed computing model in an effort to concurrently process tasks over enormous data sets that have been increasing in size as a function of time. In order to optimize all available resources (geographically spread) and minimize the processing time, it is necessary to face also the question of efficient data transfers and placements. A key question is whether the time penalty for moving the data to the computational resources is worth the presumed gain. Onward to the truly distributed task scheduling we present the technique using a Constraint Programming (CP) approach. The CP technique schedules data transfers from multiple resources considering all available paths of diverse characteristic (capacity, sharing and storage) having minimum user's waiting time as an objective. We introduce a model for planning data transfers to a single destination (data transfer) as well as its extension for an optimal data set spreading strategy (data placement). Several enhancements for a solver of the CP model will be shown, leading to a faster schedule computation time using symmetry breaking, branch cutting, well studied principles from job-shop scheduling field and several heuristics. Finally, we will present the design and implementation of a corner-stone application aimed at moving datasets according to the schedule. Results will include comparison of performance and trade-off between CP techniques and a Peer-2-Peer model from simulation framework as well as the real case scenario taken from a practical usage of a CP scheduler.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Using Constraint Programming to Plan Efficient Data Movement on the Grid
    Zerola, Michal
    Sumbera, Michal
    Bartak, Roman
    Lauret, Jerome
    ICTAI: 2009 21ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, 2009, : 729 - +
  • [2] Efficient watershed modeling using a multi-site weather generator for meteorological data
    Khalili, M.
    Leconte, R.
    Brissette, F.
    GEO-ENVIRONMENT AND LANDSCAPE EVOLUTION II: EVOLUTION, MONITORING, SIMULATION, MANAGEMENT AND REMEDIATION OF THE GEOLOGICAL ENVIRONMENT AND LANDSCAPE, 2006, 89 : 273 - +
  • [3] Efficient stochastic generation of multi-site synthetic precipitation data
    Brissette, F. P.
    Khalili, M.
    Leconte, R.
    JOURNAL OF HYDROLOGY, 2007, 345 (3-4) : 121 - 133
  • [4] Multi-site Retrieval of Declustered Data
    Tosun, Ali Saman
    28TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, VOLS 1 AND 2, PROCEEDINGS, 2008, : 486 - 493
  • [5] On multi-site damage identification using single-site training data
    Barthorpe, R. J.
    Manson, G.
    Worden, K.
    JOURNAL OF SOUND AND VIBRATION, 2017, 409 : 43 - 64
  • [6] Reading Profiles in Multi-Site Data With Missingness
    Eckert, Mark A.
    Vaden, Kenneth I., Jr.
    Gebregziabher, Mulugeta
    FRONTIERS IN PSYCHOLOGY, 2018, 9
  • [7] Harmonization of multi-site MRS data with ComBat
    Bell, Tiffany K.
    Godfrey, Kate J.
    Ware, Ashley L.
    Yeates, Keith Owen
    Harris, Ashley D.
    NEUROIMAGE, 2022, 257
  • [8] Learning with multi-site fMRI graph data
    Castrillon, J. Gabriel
    Ahmadi, Ahmad
    Navab, Nassir
    Richiardi, Jonas
    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 608 - 612
  • [9] Multi-site evaluation of terrestrial evaporation models using FLUXNET data
    Ershadi, A.
    McCabe, M. F.
    Evans, J. P.
    Chaney, N. W.
    Wood, E. F.
    AGRICULTURAL AND FOREST METEOROLOGY, 2014, 187 : 46 - 61
  • [10] Preserving data privacy when using multi-site data to estimate individualized treatment rules
    Danieli, Coraline
    Moodie, Erica E. M.
    STATISTICS IN MEDICINE, 2022, 41 (09) : 1627 - 1643