A distributed multi-storage I/O system for data intensive scientific computing

被引:1
|
作者
Shen, XH [1 ]
Choudhary, A [1 ]
机构
[1] Northwestern Univ, Dept Elect & Comp Engn, Ctr Parallel & Distributed Comp, Evanston, IL 60208 USA
关键词
multi-storage I/O system; access pattern; data intensive computing;
D O I
10.1016/j.parco.2003.05.009
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
More and more parallel applications are running in a distributed environment to take advantage of easily available and inexpensive commodity resources. For data intensive applications, employing multiple distributed storage resources has many advantages. In this paper, we present a Multi-Storage I/O System (MS-I/O) that cannot only effectively manage various distributed storage resources in the system, but also provide novel high performance storage access schemes. MS-I/O employs many state-of-the-art I/O optimizations such as collective I/O, asynchronous I/O etc. and a number of new techniques such as data location, data replication, subfile, superfile and data access history. In addition, many MS-I/O optimization schemes can work simultaneously within a single data access session, greatly improving the performance. Although I/O optimization techniques can help improve performance, it also complicates I/O system. In addition, most optimization techniques have their limitations. Therefore, selecting accurate optimization policies requires expert knowledge which is not suitable for end users who may have little knowledge of I/O techniques. So the task of I/O optimization decision should be left to the I/O system itself, that is, automatic from user's point of view. We present a User Access Pattern data structure which is associated with each dataset that can help MS-I/O easily make accurate I/O optimization decisions. (C) 2003 Elsevier B.V. All rights reserved.
引用
收藏
页码:1623 / 1643
页数:21
相关论文
共 50 条
  • [31] An optimization algorithm of data access storage in cloud computing based on distributed system
    Ye, Lunqiang
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 27 - 28
  • [32] Toward Efficient and Simplified Distributed Data Intensive Computing
    Gu, Yunhong
    Grossman, Robert
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2011, 22 (06) : 974 - 984
  • [33] Nebula: Distributed Edge Cloud for Data Intensive Computing
    Jonathan, Albert
    Ryden, Mathew
    Oh, Kwangsung
    Chandra, Abhishek
    Weissman, Jon
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (11) : 3229 - 3242
  • [34] Nebula: Distributed Edge Cloud for Data Intensive Computing
    Ryden, Mathew
    Oh, Kwangsung
    Chandra, Abhishek
    Weissman, Jon
    2014 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E), 2014, : 57 - 66
  • [35] A data intensive distributed computing architecture for "Grid" applications
    Tierney, B
    Johnston, W
    Lee, J
    Thompson, M
    FUTURE GENERATION COMPUTER SYSTEMS, 2000, 16 (05) : 473 - 481
  • [36] Data intensive distributed computing: A medical application example
    Lee, J
    Tierney, B
    Johnston, W
    HIGH-PERFORMANCE COMPUTING AND NETWORKING, PROCEEDINGS, 1999, 1593 : 150 - 158
  • [37] System level synthesis on I/O intensive low power distributed embedded system
    Li, M
    Wu, XB
    Zhu, XH
    Wang, H
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING AND NETWORKS, 2004, : 343 - 346
  • [38] A Fast and Scalable Fragmentation Algorithm for Data Protection Using Multi-storage over Independent Locations
    Kapusta, Katarzyna
    Memmi, Gerard
    SECURITY AND TRUST MANAGEMENT (STM 2018), 2018, 11091 : 54 - 69
  • [39] A MATLAB framework for forecasting optimal flow releases in a multi-storage system for flood control
    Leon, Arturo S.
    Tang, Yun
    Qin, Li
    Chen, Duan
    ENVIRONMENTAL MODELLING & SOFTWARE, 2020, 125
  • [40] Dependable data computing for distributed system
    Liu, Zhaobin
    DCABES 2006 PROCEEDINGS, VOLS 1 AND 2, 2006, : 133 - 136