Efficient Multi-site Data Movement Using Constraint Programming for Data Hungry Science

被引:1
|
作者
Zerola, Michal [1 ]
Lauret, Jerome [2 ]
Bartak, Roman [3 ]
Sumbera, Michal [1 ]
机构
[1] Acad Sci Czech Republic, Inst Nucl Phys, Prague, Czech Republic
[2] Brookhaven Natl Lab, Upton, NY USA
[3] Charles Univ Prague, Fac Math & Phys, CR-11636 Prague 1, Czech Republic
来源
17TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP09) | 2010年 / 219卷
关键词
D O I
10.1088/1742-6596/219/6/062069
中图分类号
O57 [原子核物理学、高能物理学];
学科分类号
070202 ;
摘要
For the past decade, HENP experiments have been heading towards a distributed computing model in an effort to concurrently process tasks over enormous data sets that have been increasing in size as a function of time. In order to optimize all available resources (geographically spread) and minimize the processing time, it is necessary to face also the question of efficient data transfers and placements. A key question is whether the time penalty for moving the data to the computational resources is worth the presumed gain. Onward to the truly distributed task scheduling we present the technique using a Constraint Programming (CP) approach. The CP technique schedules data transfers from multiple resources considering all available paths of diverse characteristic (capacity, sharing and storage) having minimum user's waiting time as an objective. We introduce a model for planning data transfers to a single destination (data transfer) as well as its extension for an optimal data set spreading strategy (data placement). Several enhancements for a solver of the CP model will be shown, leading to a faster schedule computation time using symmetry breaking, branch cutting, well studied principles from job-shop scheduling field and several heuristics. Finally, we will present the design and implementation of a corner-stone application aimed at moving datasets according to the schedule. Results will include comparison of performance and trade-off between CP techniques and a Peer-2-Peer model from simulation framework as well as the real case scenario taken from a practical usage of a CP scheduler.
引用
收藏
页数:10
相关论文
共 50 条
  • [11] Design for the distributed data locator service for multi-site data repositories
    Nakanishi, H.
    Yamanaka, K.
    Tokunaga, S.
    Ozeki, T.
    Homma, Y.
    Ohtsu, H.
    Ishii, Y.
    Nakajima, N.
    Yamamoto, T.
    Emoto, M.
    Ohsuna, M.
    Ito, T.
    Imazu, S.
    Nonomura, M.
    Yoshida, M.
    Ogawa, H.
    Maeno, H.
    Aoyagi, M.
    Yokota, M.
    Inoue, T.
    Nakamura, O.
    Abe, S.
    Urushidani, S.
    FUSION ENGINEERING AND DESIGN, 2021, 165
  • [12] Multi-site, multi-pollutant atmospheric data analysis using Riemannian geometry
    Smith, Alexander
    Hua, Jinxi
    de Foy, Benjamin
    Schauer, James J.
    Zavala, Victor M.
    SCIENCE OF THE TOTAL ENVIRONMENT, 2023, 892
  • [13] Stochastic multi-site generation of daily weather data
    Khalili, Malika
    Brissette, Francois
    Leconte, Robert
    STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2009, 23 (06) : 837 - 849
  • [14] Harmonization of multi-site diffusion tensor imaging data
    Fortin, Jean-Philippe
    Parker, Drew
    Tunc, Birkan
    Watanabe, Takanori
    Elliott, Mark A.
    Ruparel, Kosha
    Roalf, David R.
    Satterthwaite, Theodore D.
    Gur, Ruben C.
    Gur, Raquel E.
    Schultz, Robert T.
    Verma, Ragini
    Shinohara, Russell T.
    NEUROIMAGE, 2017, 161 : 149 - 170
  • [15] Clustered data storage for multi-site fusion experiments
    National Institute for Fusion Science, 322-6 Oroshi-cho, Toki
    509-5292, Japan
    不详
    816-8580, Japan
    不详
    305-8577, Japan
    Plasma Fusion Res.,
  • [16] Stochastic multi-site generation of daily weather data
    Malika Khalili
    François Brissette
    Robert Leconte
    Stochastic Environmental Research and Risk Assessment, 2009, 23 : 837 - 849
  • [17] Incorporating multi-event and multi-site data in the calibration of SWMM
    Arriero Shinma, T.
    Ribeiro Reis, L. F.
    12TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONTROL FOR THE WATER INDUSTRY, CCWI2013, 2014, 70 : 75 - 84
  • [18] Multi-Site and Multi-Pollutant Air Quality Data Modeling
    Hu, Min
    Liu, Bin
    Yin, Guosheng
    SUSTAINABILITY, 2024, 16 (01)
  • [19] Efficient generation of test data structures using constraint logic programming and program transformation
    Fioravanti, Fabio
    Proietti, Maurizio
    Senni, Valerio
    JOURNAL OF LOGIC AND COMPUTATION, 2015, 25 (06) : 1263 - 1283
  • [20] Using Ecometric Data to Explore Sources of Cross-Site Impact Variance in Multi-Site Trials
    Judkins, David R.
    Durham, Gabriel
    EVALUATION REVIEW, 2024, 48 (02) : 274 - 311