Collective Computing for Scientific Big Data Analysis

被引:1
|
作者
Liu, Jialin [1 ]
Chen, Yong [1 ]
Byna, Surendra [2 ]
机构
[1] Texas Tech Univ, Dept Comp Sci, Lubbock, TX 79409 USA
[2] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Computat Res Div, Berkeley, CA 94720 USA
关键词
collective computing; big data; map reduce; PERFORMANCE;
D O I
10.1109/ICPPW.2015.22
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Big science discovery requires an efficient computing framework in the high performance computing architecture. Traditional scientific data analysis relies on Message Passing Interface (MPI) and MPI-IO to achieve fast computing and low I/O bottleneck. Among them, two-phase collective I/O is commonly used to reduce data movement by optimizing the non-contiguous I/O pattern. However, the inherent constraint of collective I/O prevents it from having a flexible combination with computing and lacks an efficient non-blocking I/O-Computing framework in current HPC. In this work, we propose Collective Computing, a framework that breaks the constraint of the two-phase collective I/O and provides an efficient non-blocking computing paradigm with runtime support. The fundamental idea is to move the analysis stage in advance and insert the computation into the two-phase I/O, such that the data in the first I/O phase can be computed in place and the second shuffle phase is minimized with a reduce operation. We motivate this idea by profiling the I/O and CPU usage. With both theoretical analysis and evaluation on real application and benchmarks, we show that the collective computing can achieve 2.5X speedup and is promising in big scientific data analysis.
引用
收藏
页码:129 / 137
页数:9
相关论文
共 50 条
  • [21] Big Data Forex Analysis using GPU Computing
    Das, Lyla B.
    Arun, C.
    Sunny, John K.
    PROCEEDINGS ON 2018 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS), 2018, : 14 - 19
  • [22] Exploring Big Data Analysis: Fundamental Scientific Problems
    Xu Z.
    Shi Y.
    Annals of Data Science, 2015, 2 (04) : 363 - 372
  • [23] Sales Data Analysis of Cloud Computing Products based on Big Data
    Zhang, Xu
    He, Yumin
    Pan, Lixin
    Yao, Zhong
    IFAC PAPERSONLINE, 2022, 55 (10): : 1404 - 1409
  • [24] Research on the Development of Data Scientific Analysis Tools in the Big Data Age
    Xu, Yuxuan
    PROCEEDINGS OF THE 2017 3RD INTERNATIONAL CONFERENCE ON ECONOMICS, SOCIAL SCIENCE, ARTS, EDUCATION AND MANAGEMENT ENGINEERING (ESSAEME 2017), 2017, 119 : 2021 - 2025
  • [25] Cloud Computing and Big Data
    Hsu, Ching-Hsien
    Tang, Chunming
    Esteves, Rui M.
    JOURNAL OF INTERNET TECHNOLOGY, 2014, 15 (06): : 995 - 997
  • [26] Big data and cloud computing
    Shrestha, Rasu B.
    APPLIED RADIOLOGY, 2014, 43 (03) : 32 - 34
  • [27] Multimedia Big Data Computing
    Zhu, Wenwu
    Cui, Peng
    Wang, Zhi
    Hua, Gang
    IEEE MULTIMEDIA, 2015, 22 (03) : 96 - 105
  • [28] Exascale Computing and Big Data
    Reed, Daniel A.
    Dongarra, Jack
    COMMUNICATIONS OF THE ACM, 2015, 58 (07) : 56 - 68
  • [29] The anatomy of big data computing
    Kune, Raghavendra
    Konugurthi, Pramod Kumar
    Agarwal, Arun
    Chillarige, Raghavendra Rao
    Buyya, Rajkumar
    SOFTWARE-PRACTICE & EXPERIENCE, 2016, 46 (01): : 79 - 105
  • [30] Big data for Scientific Knowledge
    Canals, Agusti
    Lopez-Borrull, Alexandre
    PROCEEDINGS OF THE 18TH EUROPEAN CONFERENCE ON KNOWLEDGE MANAGEMENT (ECKM 2017), VOLS 1 AND 2, 2017, : 197 - 205