Collective Computing for Scientific Big Data Analysis

被引:1
|
作者
Liu, Jialin [1 ]
Chen, Yong [1 ]
Byna, Surendra [2 ]
机构
[1] Texas Tech Univ, Dept Comp Sci, Lubbock, TX 79409 USA
[2] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Computat Res Div, Berkeley, CA 94720 USA
关键词
collective computing; big data; map reduce; PERFORMANCE;
D O I
10.1109/ICPPW.2015.22
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Big science discovery requires an efficient computing framework in the high performance computing architecture. Traditional scientific data analysis relies on Message Passing Interface (MPI) and MPI-IO to achieve fast computing and low I/O bottleneck. Among them, two-phase collective I/O is commonly used to reduce data movement by optimizing the non-contiguous I/O pattern. However, the inherent constraint of collective I/O prevents it from having a flexible combination with computing and lacks an efficient non-blocking I/O-Computing framework in current HPC. In this work, we propose Collective Computing, a framework that breaks the constraint of the two-phase collective I/O and provides an efficient non-blocking computing paradigm with runtime support. The fundamental idea is to move the analysis stage in advance and insert the computation into the two-phase I/O, such that the data in the first I/O phase can be computed in place and the second shuffle phase is minimized with a reduce operation. We motivate this idea by profiling the I/O and CPU usage. With both theoretical analysis and evaluation on real application and benchmarks, we show that the collective computing can achieve 2.5X speedup and is promising in big scientific data analysis.
引用
收藏
页码:129 / 137
页数:9
相关论文
共 50 条
  • [41] A Survey on Big Data and Collective Intelligence
    Karydis, Ioannis
    Sioutas, Spyros
    Avlonitis, Markos
    Mylonas, Phivos
    Kanavos, Andreas
    ALGORITHMIC ASPECTS OF CLOUD COMPUTING, ALGOCLOUD 2016, 2017, 10230 : 169 - 181
  • [42] Soft computing techniques for big data and cloud computing
    Gupta, B. B.
    Agrawal, Dharma P.
    Yamaguchi, Shingo
    Sheng, Michael
    SOFT COMPUTING, 2020, 24 (08) : 5483 - 5484
  • [43] Soft computing techniques for big data and cloud computing
    B. B. Gupta
    Dharma P. Agrawal
    Shingo Yamaguchi
    Michael Sheng
    Soft Computing, 2020, 24 : 5483 - 5484
  • [44] Optimizing patient transportation by applying cloud computing and big data analysis
    Thai, Hong-Danh
    Huh, Jun-Ho
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (16): : 18061 - 18090
  • [45] Applying could computing to analysis to the big-data stock system
    Chen, Chiu-Chin
    Liao, Chia-Chun
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 36 - 36
  • [46] Application of Cloud Computing Technology and Big Data Analysis in Electronic Commerce
    Bai, Xue
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON SENSOR NETWORK AND COMPUTER ENGINEERING, 2016, 68 : 143 - 148
  • [47] Exploring the Feasibility of Heterogeneous Computing of Complex Networks for Big Data Analysis
    Garcia-Robledo, Alberto
    Diaz-Perez, Arturo
    Morales-Luna, Guillermo
    2015 12TH INTERNATIONAL CONFERENCE & EXPO ON EMERGING TECHNOLOGIES FOR A SMARTER WORLD (CEWIT), 2015,
  • [48] The Real Estate Big Data Analysis System Based on Cloud Computing
    Li, Jin
    2021 13TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2021), 2021, : 729 - 732
  • [49] Statistical analysis of medical records based on big data and cloud computing
    Deng, S. X.
    Ge, X. X.
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2018, 123 : 12 - 12
  • [50] Distributed Fuzzy Rough Set for Big Data Analysis in Cloud Computing
    Qu, Wenhao
    Kong, Linghe
    Wu, Kaishun
    Tang, Feilong
    Chen, Guihai
    2019 IEEE 25TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2019, : 109 - 116