Efficient Multi-site Data Movement Using Constraint Programming for Data Hungry Science

被引:1
|
作者
Zerola, Michal [1 ]
Lauret, Jerome [2 ]
Bartak, Roman [3 ]
Sumbera, Michal [1 ]
机构
[1] Acad Sci Czech Republic, Inst Nucl Phys, Prague, Czech Republic
[2] Brookhaven Natl Lab, Upton, NY USA
[3] Charles Univ Prague, Fac Math & Phys, CR-11636 Prague 1, Czech Republic
来源
17TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP09) | 2010年 / 219卷
关键词
D O I
10.1088/1742-6596/219/6/062069
中图分类号
O57 [原子核物理学、高能物理学];
学科分类号
070202 ;
摘要
For the past decade, HENP experiments have been heading towards a distributed computing model in an effort to concurrently process tasks over enormous data sets that have been increasing in size as a function of time. In order to optimize all available resources (geographically spread) and minimize the processing time, it is necessary to face also the question of efficient data transfers and placements. A key question is whether the time penalty for moving the data to the computational resources is worth the presumed gain. Onward to the truly distributed task scheduling we present the technique using a Constraint Programming (CP) approach. The CP technique schedules data transfers from multiple resources considering all available paths of diverse characteristic (capacity, sharing and storage) having minimum user's waiting time as an objective. We introduce a model for planning data transfers to a single destination (data transfer) as well as its extension for an optimal data set spreading strategy (data placement). Several enhancements for a solver of the CP model will be shown, leading to a faster schedule computation time using symmetry breaking, branch cutting, well studied principles from job-shop scheduling field and several heuristics. Finally, we will present the design and implementation of a corner-stone application aimed at moving datasets according to the schedule. Results will include comparison of performance and trade-off between CP techniques and a Peer-2-Peer model from simulation framework as well as the real case scenario taken from a practical usage of a CP scheduler.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Commentary: Protecting Human Subjects and Their Data in Multi-site Research
    Luft, Harold S.
    MEDICAL CARE, 2012, 50 (07) : S74 - S76
  • [22] Multi-site capacity planning for the pharmaceutical industry using mathematical programming
    Levis, AA
    Papageorgiou, LG
    EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING - 13, 2003, 14 : 1097 - 1102
  • [23] Multi-site harmonization of diffusion MRI data in a registration framework
    Mirzaalian, Hengameh
    Ning, Lipeng
    Savadjiev, Peter
    Pasternak, Ofer
    Bouix, Sylvain
    Michailovich, Oleg
    Karmacharya, Sarina
    Grant, Gerald
    Marx, Christine E.
    Morey, Rajendra A.
    Flashman, Laura A.
    George, Mark S.
    McAllister, Thomas W.
    Andaluz, Norberto
    Shutter, Lori
    Coimbra, Raul
    Zafonte, Ross D.
    Coleman, Mike J.
    Kubicki, Marek
    Westin, Carl-Fredrik
    Stein, Murray B.
    Shenton, Martha E.
    Rathi, Yogesh
    BRAIN IMAGING AND BEHAVIOR, 2018, 12 (01) : 284 - 295
  • [24] Harmonization of Multi-site Dynamic Functional Connectivity Network Data
    Bostami, Biozid
    Calhoun, Vince D.
    Van der Horn, Harm J.
    Vergara, Victor
    2021 IEEE 21ST INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (IEEE BIBE 2021), 2021,
  • [25] Harmonization of Multi-site Cortical Data Across the Human Lifespan
    Ahmad, Sahar
    Nan, Fang
    Wu, Ye
    Wu, Zhengwang
    Lin, Weili
    Wang, Li
    Li, Gang
    Wu, Di
    Yap, Pew-Thian
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2022, 2022, 13583 : 220 - 229
  • [26] Integration of multi-site clinical research data into a centralized resource
    Erickson, David C. C.
    MOLECULAR GENETICS AND METABOLISM, 2009, 96 (02) : S23 - S23
  • [27] Feasibility of multi-site clinical structural neuroimaging studies of aging using legacy data
    Fennema-Notestine, Christine
    Gamst, Anthony C.
    Quinn, Brian T.
    Pacheco, Jenni
    Jernigan, Terry L.
    Thal, Leon
    Buckner, Randy
    Killiany, Ron
    Blacker, Deborah
    Dale, Anders M.
    Fischl, Bruce
    Dickerson, Brad
    Gollub, Randy L.
    NEUROINFORMATICS, 2007, 5 (04) : 235 - 245
  • [28] Federated Bayesian network learning from multi-site data
    Liu, Shuai
    Yan, Xiao
    Guo, Xiao
    Qi, Shun
    Wang, Huaning
    Chang, Xiangyu
    JOURNAL OF BIOMEDICAL INFORMATICS, 2025, 163
  • [29] Maintaining data quality across a multi-site intervention.
    Raha, DJ
    O'Brien, R
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2001, 153 (11) : S85 - S85
  • [30] Multi-site harmonization of diffusion MRI data in a registration framework
    Hengameh Mirzaalian
    Lipeng Ning
    Peter Savadjiev
    Ofer Pasternak
    Sylvain Bouix
    Oleg Michailovich
    Sarina Karmacharya
    Gerald Grant
    Christine E. Marx
    Rajendra A. Morey
    Laura A. Flashman
    Mark S. George
    Thomas W. McAllister
    Norberto Andaluz
    Lori Shutter
    Raul Coimbra
    Ross D. Zafonte
    Mike J. Coleman
    Marek Kubicki
    Carl-Fredrik Westin
    Murray B. Stein
    Martha E. Shenton
    Yogesh Rathi
    Brain Imaging and Behavior, 2018, 12 : 284 - 295