Efficient retrieval of multidimensional datasets through parallel I/O

被引:4
|
作者
Prabhakar, S [1 ]
Abdel-Ghaffar, K [1 ]
Agrawal, D [1 ]
El Abbadi, A [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
关键词
D O I
10.1109/HIPC.1998.738011
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many scientific and engineering applications process large multidimensional datasets. An important access pattern for these applications is the retrieval of data corresponding to ranges of values in multiple dimensions. Performance is limited by disks largely due to high disk latencies. Tiling and distributing the data across multiple disks is an effective technique for improving performance through parallel I/O. The distribution of tiles across the disks is an important factor in achieving gains. Several schemes for declustering multidimensional data to improve the performance of range queries have been proposed in the literature. We extend the class of Cyclic schemes which have been developed earlier for two-dimensional data to multi pie dimensions. We establish important properties of Cyclic schemes, based upon which we reduce the search space for determining good declustering schemes within the class of Cyclic schemes. Through experimental evaluation, we establish that the Cyclic schemes are superior to other declustering schemes, including the state-of-the-art, both in terms of the degree of parallelism and robustness.
引用
收藏
页码:375 / 382
页数:8
相关论文
共 50 条
  • [41] Efficient HTTP Based I/O on Very Large Datasets for High Performance Computing with the Libdavix Library
    Devresse, Adrien
    Furano, Fabrizio
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8807 : 194 - 205
  • [42] An efficient parallel computing strategy for the processing of large GNSS network datasets
    Yang Cui
    Zhengsheng Chen
    Linyang Li
    Qinghua Zhang
    Sheng Luo
    Zhiping Lu
    GPS Solutions, 2021, 25
  • [43] An Efficient Architecture for Parallel Skyline Computation over Large Distributed Datasets
    Li, He
    Jang, Sumin
    Yoo, Jaesoo
    JOURNAL OF INTERNET TECHNOLOGY, 2014, 15 (04): : 577 - 588
  • [44] An efficient parallel computing strategy for the processing of large GNSS network datasets
    Cui, Yang
    Chen, Zhengsheng
    Li, Linyang
    Zhang, Qinghua
    Luo, Sheng
    Lu, Zhiping
    GPS SOLUTIONS, 2021, 25 (02)
  • [45] ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems
    Suren Byna
    M. Scot Breitenfeld
    Bin Dong
    Quincey Koziol
    Elena Pourmal
    Dana Robinson
    Jerome Soumagne
    Houjun Tang
    Venkatram Vishwanath
    Richard Warren
    Journal of Computer Science and Technology, 2020, 35 : 145 - 160
  • [46] Parallel and I/O-Efficient Algorithms for Non-Linear Preferential Attachment
    Allendorf, Daniel
    Meyer, Ulrich
    Penschuck, Manuel
    Tran, Hung
    2023 PROCEEDINGS OF THE SYMPOSIUM ON ALGORITHM ENGINEERING AND EXPERIMENTS, ALENEX, 2023, : 65 - 76
  • [47] Opportunistic Data-driven Execution of Parallel Programs for Efficient I/O Services
    Zhang, Xuechen
    Davis, Kei
    Jiang, Song
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 330 - 341
  • [48] HDF5 Cache VOL: Efficient and Scalable Parallel I/O through Caching Data on Node-local Storage
    Zheng, Huihuo
    Vishwanath, Venkatram
    Koziol, Quincey
    Tang, Houjun
    Ravi, John
    Mainzer, John
    Byna, Suren
    2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022), 2022, : 61 - 70
  • [49] ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems
    Byna, Suren
    Breitenfeld, M. Scot
    Dong, Bin
    Koziol, Quincey
    Pourmal, Elena
    Robinson, Dana
    Soumagne, Jerome
    Tang, Houjun
    Vishwanath, Venkatram
    Warren, Richard
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2020, 35 (01) : 145 - 160
  • [50] Heuristic algorithms for I/O scheduling for efficient retrieval of large objects from tertiary storage
    Moon, C
    Kang, H
    PROCEEDINGS OF THE 12TH AUSTRALASIAN DATABASE CONFERENCE, ADC 2001, 2001, 23 (02): : 145 - 152