Blaze: A High-Performance, Scalable, and Efficient Data Transfer Framework with Configurable and Extensible Features

被引:2
|
作者
Marru, Suresh [1 ]
Freitag, Brian [2 ]
Wannipurage, Dimuthu [1 ]
Bommala, Uday Kumar [3 ]
Pradier, Patrick [4 ]
Demange, Christophe [4 ]
Pantha, Nishan [3 ]
Mukherjee, Tathagata [3 ]
Rosich, Betlem [5 ]
Monjoux, Eric [5 ]
Ramachandran, Rahul [2 ]
机构
[1] Indiana Univ, Bloomington, IN 47405 USA
[2] NASA Marshall Space Flight Ctr, Huntsville, AL USA
[3] Univ Alabama Huntsville, Huntsville, AL USA
[4] GAEL Syst, Champs Sur Marne, France
[5] European Space Agcy, Rome, Italy
关键词
data transfer; data orchestration; airavata mft; blaze framework; cloud transfer; LANDSAT; SCIENCE;
D O I
10.1109/CLOUD60044.2023.00016
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Blaze is a high-speed data transfer framework that enables efficient and scalable data movement between distributed storage systems. In this paper, we describe the design, implementation, and evaluation of Blaze in the context of a case study involving the transfer of 5.6 petabytes of data from OVH cloud storage to an Amazon S3 bucket. We discuss the technical challenges and design choices that led to creating a single-agent architecture, providing flexibility in agent placement while minimizing operational costs. We also demonstrate the orchestration capabilities of Blaze using Apache Airflow to manage hierarchical workflows for data transfer between storage systems. Our evaluation shows that Blaze achieved the expected 20 Gbps throughput during data transfer and provided significant cost savings compared to other architectures. The results demonstrate that Blaze is a practical solution for high-speed data transfer in large-scale distributed storage environments.
引用
收藏
页码:58 / 68
页数:11
相关论文
共 50 条
  • [1] A Configurable Framework for High-Performance Graph Storage and Mutation
    Firmli, Soukaina
    Chiadmi, Dalila
    Dahbi, Kawtar Younsi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (08) : 1323 - 1331
  • [2] High-performance extensible indexing
    Kornacker, M
    PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, 1999, : 699 - 708
  • [3] Extensible architecture for high-performance, scalable, reliable publish-subscribe eventing and notification
    Ostrowski, Krzysztof
    Birman, Ken
    Doley, Danny
    INTERNATIONAL JOURNAL OF WEB SERVICES RESEARCH, 2007, 4 (04) : 18 - 58
  • [4] Scalable, high-performance data mining with parallel processing
    Freitas, AA
    PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 1510 : 477 - 477
  • [5] A Service-Oriented Collaborative Framework for High-Performance Data Transfer in Grids
    Wang, Chien-Min
    Chen, Hsi-Min
    Hsu, Chun-Chen
    Lee, Jonathan
    JOURNAL OF INTERNET TECHNOLOGY, 2011, 12 (06): : 899 - 909
  • [6] A Scalable Runtime Fault Localization Framework for High-Performance Computing Systems
    Gao, Jian
    Wei, Hongmei
    Yu, Kang
    Qing, Peng
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2018, 46 (04) : 749 - 761
  • [7] A Scalable Runtime Fault Localization Framework for High-Performance Computing Systems
    Jian Gao
    Hongmei Wei
    Kang Yu
    Peng Qing
    International Journal of Parallel Programming, 2018, 46 : 749 - 761
  • [8] CGLX: A Scalable, High-Performance Visualization Framework for Networked Display Environments
    Doerr, Kai-Uwe
    Kuester, Falko
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2011, 17 (03) : 320 - 332
  • [9] Scalable I/O Forwarding Framework for High-Performance Computing Systems
    Ali, Nawab
    Carns, Philip
    Iskra, Kamil
    Kimpe, Dries
    Lang, Samuel
    Latham, Robert
    Ross, Robert
    Ward, Lee
    Sadayappan, P.
    2009 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING AND WORKSHOPS, 2009, : 86 - +
  • [10] Implementation of RTL Scalable High-Performance Data Compression Method
    Chen X.-J.
    Li B.
    Zhou Q.-L.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2022, 50 (07): : 1548 - 1557