Scalable I/O and analytics

被引:14
|
作者
Choudhary, Alok [1 ]
Liao, Wei-keng [1 ]
Gao, Kui [1 ]
Nisar, Arifa [1 ]
Ross, Robert [2 ]
Thakur, Rajeev [2 ]
Latham, Robert [2 ]
机构
[1] Northwestern Univ, Dept Elect Engn & Comp Sci, Evanston, IL 60208 USA
[2] Argonne Natl Lab, Div Math & Comp Sci, Argonne, IL 60439 USA
基金
美国国家科学基金会;
关键词
D O I
10.1088/1742-6596/180/1/012048
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
High-performance computing systems have already approached peta-scale with hundreds of thousands of processors/cores in many deployments. These systems promise a new level of predictive and knowledge discovery ability as researchers gain the capability to model dependencies between phenomena at scales not seen earlier. These applications are highly I/O and data intensive, leading scientists to observe that performing I/O and subsequent analyses are major bottlenecks in effectively utilizing peta-scale systems and a major hurdle in accelerating discoveries. Although significant progress has been made in performance, interfaces, and middleware runtime systems for I/O in the recent past, significantly more research and development needs to be carried out to scale the performance to the desired levels for systems containing tens to hundreds of thousands of cores. In this work we outline our recent achievements and current research for designing scalable I/O software and enabling data analytics in storage systems. We also enumerate key challenges for the I/O systems and discuss ongoing efforts that address these challenges.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] hfplayer: Scalable Replay for Intensive Block I/O Workloads
    Haghdoost, Alireza
    He, Weiping
    Fredin, Jerry
    Du, David H. C.
    ACM TRANSACTIONS ON STORAGE, 2017, 13 (04)
  • [42] Scalable design and implementations for MPI parallel overlapping I/O
    Liao, Wei-keng
    Coloma, Kenin
    Choudhary, Alok
    Ward, Lee
    Russell, Eric
    Pundit, Neil
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2006, 17 (11) : 1264 - 1276
  • [43] Achieving robust, scalable cluster I/O in Java']Java
    Welsh, M
    Culler, D
    LANGUAGES, COMPILERS, AND RUN-TIME SYSTEMS FOR SCALABLE COMPUTERS, 2000, 1915 : 16 - 31
  • [44] Aggregating Local Storage for Scalable Deep Learning I/O
    Zhang, Zhao
    Huang, Lei
    Pauloski, J. Gregory
    Foster, Ian T.
    PROCEEDINGS OF 2019 IEEE/ACM THIRD WORKSHOP ON DEEP LEARNING ON SUPERCOMPUTERS (DLS), 2019, : 69 - 75
  • [45] Scalable Deep Learning via I/O Analysis and Optimization
    Pumma, Sarunya
    Si, Min
    Feng, Wu
    Balaji, Pavan
    ACM TRANSACTIONS ON PARALLEL COMPUTING, 2019, 6 (02)
  • [46] GRAPHVINE: Exploiting Multicast for Scalable Graph Analytics
    Belayneh, Leul
    Bertacco, Valeria
    PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020), 2020, : 762 - 767
  • [47] NewSQL Databases and Scalable In-Memory Analytics
    Duggirala, Siddhartha
    DEEP DIVE INTO NOSQL DATABASES: THE USE CASES AND APPLICATIONS, 2018, 109 : 49 - 76
  • [48] GALGO: Scalable Graph Analytics with a Parallel DBMS
    Cabrera, Wellington
    Zhou, Xiantian
    Bellatreche, Ladjel
    Ordonez, Carlos
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4823 - 4827
  • [49] Scalable Progressive Analytics on Big Data in the Cloud
    Chandramouli, Badrish
    Goldstein, Jonathan
    Quamar, Abdul
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (14): : 1726 - 1737
  • [50] Scalable genomic data exchange and analytics with sBeacon
    Anuradha Wickramarachchi
    Brendan Hosking
    Yatish Jain
    John Grimes
    Mitchell J. O’Brien
    Tracey Wright
    Mark A. Burgess
    Victor San Kho Lin
    Florian Reisinger
    Oliver Hofmann
    Michael Lawley
    Laurence O. W. Wilson
    Natalie A. Twine
    Denis C. Bauer
    Nature Biotechnology, 2023, 41 : 1510 - 1512