Collective Computing for Scientific Big Data Analysis

被引:1
|
作者
Liu, Jialin [1 ]
Chen, Yong [1 ]
Byna, Surendra [2 ]
机构
[1] Texas Tech Univ, Dept Comp Sci, Lubbock, TX 79409 USA
[2] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Computat Res Div, Berkeley, CA 94720 USA
关键词
collective computing; big data; map reduce; PERFORMANCE;
D O I
10.1109/ICPPW.2015.22
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Big science discovery requires an efficient computing framework in the high performance computing architecture. Traditional scientific data analysis relies on Message Passing Interface (MPI) and MPI-IO to achieve fast computing and low I/O bottleneck. Among them, two-phase collective I/O is commonly used to reduce data movement by optimizing the non-contiguous I/O pattern. However, the inherent constraint of collective I/O prevents it from having a flexible combination with computing and lacks an efficient non-blocking I/O-Computing framework in current HPC. In this work, we propose Collective Computing, a framework that breaks the constraint of the two-phase collective I/O and provides an efficient non-blocking computing paradigm with runtime support. The fundamental idea is to move the analysis stage in advance and insert the computation into the two-phase I/O, such that the data in the first I/O phase can be computed in place and the second shuffle phase is minimized with a reduce operation. We motivate this idea by profiling the I/O and CPU usage. With both theoretical analysis and evaluation on real application and benchmarks, we show that the collective computing can achieve 2.5X speedup and is promising in big scientific data analysis.
引用
收藏
页码:129 / 137
页数:9
相关论文
共 50 条
  • [1] Application Of Cloud Computing In Biomedicine Big Data Analysis Cloud Computing In Big Data
    Yang, Tianyi
    Zhao, Yang
    2017 INTERNATIONAL CONFERENCE ON ALGORITHMS, METHODOLOGY, MODELS AND APPLICATIONS IN EMERGING TECHNOLOGIES (ICAMMAET), 2017,
  • [2] Guest Editorial: Cloud Computing and Scientific Applications (CCSA)-Big Data Analysis in the Cloud
    Nepal, Surya
    Pandey, Suraj
    COMPUTER JOURNAL, 2016, 59 (03): : 285 - 286
  • [3] Cloud Computing for Big Data Analysis
    Marozzo, Fabrizio
    Belcastro, Loris
    APPLIED SCIENCES-BASEL, 2022, 12 (20):
  • [4] Scientific Computing Meets Big Data Technology: An Astronomy Use Case
    Zhang, Zhao
    Barbary, Kyle
    Nothaft, Frank Austin
    Sparks, Evan
    Zahn, Oliver
    Franklin, Michael J.
    Patterson, David A.
    Perlmutter, Saul
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 918 - 927
  • [5] Analysis of Scientific and Technical Literature in the Big Data
    Zeng, Wen
    Li, Hui
    Qi, Na
    PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON ADVANCED CONTROL, AUTOMATION AND ARTIFICIAL INTELLIGENCE (ACAAI 2018), 2018, 155 : 179 - 181
  • [6] Software engineering for scientific big data analysis
    Gruening, Bjoern A.
    Lampa, Samuel
    Vaudel, Marc
    Blankenberg, Daniel
    GIGASCIENCE, 2019, 8 (05):
  • [7] DATA INTENSIVE SCIENTIFIC ANALYSIS WITH GRID COMPUTING
    Terzo, Olivier
    Mossucca, Lorenzo
    Cucca, Manuela
    Notarpietro, Riccardo
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2011, 21 (02) : 219 - 228
  • [8] Transformative computing in security, big data analysis, and cloud computing applications
    Ogiela, Lidia
    Leu, Fang-Yie
    Fiore, Ugo
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (23):
  • [9] Application of Big Data Tourism Management Based on Scientific Computing Visualization Algorithms
    Shi, Qingbo
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (09) : 603 - 610
  • [10] Rethinking High Performance Computing System Architecture for Scientific Big Data Applications
    Chen, Yong
    Chen, Chao
    Yin, Yanlong
    Sun, Xian-He
    Thakur, Rajeev
    Gropp, William D.
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1605 - 1612