A distributed multi-storage I/O system for data intensive scientific computing

被引:1
|
作者
Shen, XH [1 ]
Choudhary, A [1 ]
机构
[1] Northwestern Univ, Dept Elect & Comp Engn, Ctr Parallel & Distributed Comp, Evanston, IL 60208 USA
关键词
multi-storage I/O system; access pattern; data intensive computing;
D O I
10.1016/j.parco.2003.05.009
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
More and more parallel applications are running in a distributed environment to take advantage of easily available and inexpensive commodity resources. For data intensive applications, employing multiple distributed storage resources has many advantages. In this paper, we present a Multi-Storage I/O System (MS-I/O) that cannot only effectively manage various distributed storage resources in the system, but also provide novel high performance storage access schemes. MS-I/O employs many state-of-the-art I/O optimizations such as collective I/O, asynchronous I/O etc. and a number of new techniques such as data location, data replication, subfile, superfile and data access history. In addition, many MS-I/O optimization schemes can work simultaneously within a single data access session, greatly improving the performance. Although I/O optimization techniques can help improve performance, it also complicates I/O system. In addition, most optimization techniques have their limitations. Therefore, selecting accurate optimization policies requires expert knowledge which is not suitable for end users who may have little knowledge of I/O techniques. So the task of I/O optimization decision should be left to the I/O system itself, that is, automatic from user's point of view. We present a User Access Pattern data structure which is associated with each dataset that can help MS-I/O easily make accurate I/O optimization decisions. (C) 2003 Elsevier B.V. All rights reserved.
引用
收藏
页码:1623 / 1643
页数:21
相关论文
共 50 条
  • [41] Alleviation of Disk I/O Contention in Virtualized Settings for Data-Intensive Computing
    Malensek, Matthew
    Pallickara, Sangmi Lee
    Pallickara, Shrideep
    2015 IEEE/ACM 2ND INTERNATIONAL SYMPOSIUM ON BIG DATA COMPUTING (BDC), 2015, : 1 - 10
  • [42] Distributed computing environment for data intensive tasks by use of Metadispatcher
    Huhlaev, E
    Kalyaev, V
    Kruglov, N
    NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 2003, 502 (2-3): : 415 - 417
  • [43] A Study on Workload Imbalance Issues in Data Intensive Distributed Computing
    Groot, Sven
    Coda, Kazuo
    Kitsuregawa, Masaru
    DATABASES IN NETWORKED INFORMATION SYSTEMS, PROCEEDINGS, 2010, 5999 : 27 - 32
  • [44] Distributed Scientific Workflow Management for Data-Intensive Applications
    Shumilov, S.
    Leng, Y.
    El-Gayyar, M.
    Cremers, A. B.
    12TH IEEE INTERNATIONAL WORKSHOP ON FUTURE TRENDS OF DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2008, : 65 - 73
  • [45] Nebula: Distributed Edge Cloud for Data-Intensive Computing
    Ryden, Mathew
    Oh, Kwangsung
    Chandra, Abhishek
    Weissman, Jon
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON COLLABORATION TECHNOLOGIES AND SYSTEMS (CTS), 2014, : 491 - 492
  • [46] Modeling for I/O Intensive Applications in Cloud Computing
    Peng Junjie
    Rao Yi
    Dai Yongchuan
    Zhi Xiaofei
    9TH IEEE INTERNATIONAL SYMPOSIUM ON SERVICE-ORIENTED SYSTEM ENGINEERING (SOSE 2015), 2015, : 229 - 234
  • [47] OctopusFS in Action: Tiered Storage Management for Data Intensive Computing
    Kakoulli, Elena
    Karmiris, Nikolaos D.
    Herodotou, Herodotos
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (12): : 1914 - 1917
  • [48] Data intensive distributed computing in data aware self-organizing networks
    Phan, Cong-Vinh
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, 7050 LNCS : 74 - 107
  • [49] Distributed Data Provenance for Large-Scale Data-Intensive Computing
    Zhao, Dongfang
    Shou, Chen
    Malik, Tanu
    Raicu, Ioan
    2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2013,
  • [50] Data integrity in a distributed storage system
    Bright, JD
    Chandy, JA
    PDPTA'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-4, 2003, : 688 - 694