SciAP: A Programmable, High-Performance Platform for Large-Scale Scientific Data

被引:0
|
作者
Tian, Yang [1 ]
Li, Chao [1 ]
Liu, Chao [1 ]
Yan, Haihua [1 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, Beijing, Peoples R China
关键词
Scientific Data; Model-Driven; Multidimensional; Spark; Partitioning;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific instruments and computer simulations such as satellite feeds, medical informatics and bioinformatics research are creating massive amount of data which requires technology innovation to reveal the underlying structure and facilitate decision making. However, storage capacity, analytical accuracy and processing efficiency in scientific research field are not coping with the exponential data growth. As the multidimensional data structure and the exclusive indexing method raise the difficulties in promoting parallel I/O and unified processing, and it lacks out-of-the-box interoperability between large-scale scientific data and big data technologies. In order to address these issues, we present SciAP, a programmable, high-performance platform for large-scale scientific data. SciAP enables specific-domain scientists to natively execute Spark programs and applications for processing and analyzing scientific data on HPC environment, and uses model-driven way to extract abstract models from heterogeneous scientific data formats, ultimately provides a unified interface to access scientific raw data. We integrate an auto partitioning algorithm to determine the data partitioning layout based on scientific meta data and connect with Spark RDDs structure to specify task granularity and navigate parallel I/O. Experiment evaluation shows SciAP achieved an overall improvement of 2.1x over Spark range partitioning way, and 2.3x speedups over serial implementation in Prestack Kirchhoff Time Migration algorithm.
引用
收藏
页码:148 / 154
页数:7
相关论文
共 50 条
  • [1] A high-performance application data environment for large-scale scientific computations
    Shen, XH
    Liao, WK
    Chouldhary, A
    Memik, G
    Kandemir, M
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2003, 14 (12) : 1262 - 1274
  • [2] Meta-data management system for high-performance large-scale scientific data access
    Liao, WK
    Shen, XH
    Choudhary, A
    HIGH PERFORMANCE COMPUTING - HIPC 2000, PROCEEDINGS, 2001, 1970 : 293 - 300
  • [3] Data Centric Framework for Large-scale High-performance Parallel Computation
    Ono, Kenji
    Kawashima, Yasuhiro
    Kawanabe, Tonaohiro
    2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2014, 29 : 2336 - 2350
  • [4] A Dynamic Programmable Network for Large-Scale Scientific Data Transfer Using AmoebaNet
    Shah, Syed Asif Raza
    Noh, Seo-Young
    APPLIED SCIENCES-BASEL, 2019, 9 (21):
  • [5] Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing
    Fiala, David
    Mueller, Frank
    Engelmann, Christian
    Riesen, Rolf
    Ferreira, Kurt
    Brightwell, Ron
    2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,
  • [6] HIGH-PERFORMANCE POLYURETHANES - CORRELATION OF LABORATORY DATA WITH LARGE-SCALE FIRE TESTS
    STONE, H
    PCOLINSKY, M
    PAULY, D
    HOMETCHKO, D
    JOURNAL OF CONSUMER PRODUCT FLAMMABILITY, 1981, 8 (02): : 105 - 131
  • [7] Exploring the Optimal Strategy for Large-scale Data Movement in High-performance Networks
    Brown, Patrick
    Zhu, Mengxia
    Wu, Qishi
    Yun, Daqing
    Zurawski, Jason
    2012 IEEE 31ST INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2012, : 181 - +
  • [8] BASIS: High-performance bioinformatics platform for processing of large-scale mass spectrometry imaging data in chemically augmented histology
    Veselkov, Kirill
    Sleeman, Jonathan
    Claude, Emmanuelle
    Vissers, Johannes P. C.
    Galea, Dieter
    Mroz, Anna
    Laponogov, Ivan
    Towers, Mark
    Tonge, Robert
    Mirnezami, Reza
    Takats, Zoltan
    Nicholson, Jeremy K.
    Langridge, James I.
    SCIENTIFIC REPORTS, 2018, 8
  • [9] BASIS: High-performance bioinformatics platform for processing of large-scale mass spectrometry imaging data in chemically augmented histology
    Kirill Veselkov
    Jonathan Sleeman
    Emmanuelle Claude
    Johannes P. C. Vissers
    Dieter Galea
    Anna Mroz
    Ivan Laponogov
    Mark Towers
    Robert Tonge
    Reza Mirnezami
    Zoltan Takats
    Jeremy K. Nicholson
    James I. Langridge
    Scientific Reports, 8
  • [10] A High-Performance Routing Engine for Large-Scale FPGAs
    Martin, Timothy
    Maarouf, Dani
    Grewal, Gary
    Areibi, Shawki
    2024 34TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL 2024, 2024, : 53 - 59