Error-Controlled Data Reduction Approach for Large-Scale Structured Datasets

被引:0
|
作者
Ai Z. [1 ,2 ]
Leng J. [1 ,2 ]
Xia F. [1 ,2 ]
Wang H. [1 ,2 ]
Cao Y. [1 ,2 ]
机构
[1] Institute of Applied Physics and Computational Mathematics, Beijing
[2] Software Center for High Performance Numerical Simulation, China Academy of Engineering Physics, Beijing
关键词
Adaptive refinement; Data reduction; Error-controlled; Multi-resolution tech-niques; Structured datasets;
D O I
10.3724/SP.J.1089.2021.19263
中图分类号
学科分类号
摘要
The massive datasets generated by scientific or engineering simulations have reached terabytes (TB) or even pe-tabytes (PB). Data reduction has thus become one of the most important tools for saving I/O and storage costs. In order to achieve high-precision visualization and analysis, an error-controlled data reduction approach is proposed for reducing structured large-scale datasets. Firstly, taken the difference between the resulting data and the original one as a constraint, a multi-level structured adaptively-refined background grid is constructed, according to the spatial distribution characteristics of the underlying physical fields. Secondly, the original data is interpolated and mapped to the background grid, and as a result, the data with much less cells is obtained and the storage cost is reduced. Finally, the reduced data is exported to the parallel file system in real time. The proposed data reduction algorithm is implemented based on the parallel programming framework named JASMIN. In this way, the algorithm can be directly coupled with the numerical simulation programs developed with JASMIN. Test results demonstrate that the parallel algorithm can be extended to tens of thousands of CPU cores in parallel. The proposed algorithm has been successfully applied to the electromagnetic simulation of unmanned aerial vehicle irradiation. The cell number of a structured dataset with one hundred billions cells is reduced by 99.8%, with the relative error less than 10%. The peak signal-to- noise ratio between the two images, rendered using the reduced data and the original one respectively, is equal to 47.08 dB, which means a high similarity and thus satisfies the precision requirement of visualization. © 2021, Beijing China Science Journal Publishing Co. Ltd. All right reserved.
引用
收藏
页码:1795 / 1802
页数:7
相关论文
共 30 条
  • [1] Li S, Marsaglia N, Garth C, Et al., Data reduction techniques for simulation, visualization and data analysis, Computer Graphics Forum, 37, 6, pp. 422-447, (2018)
  • [2] Witten I H, Neal R M, Cleary J G., Arithmetic coding for data compression, Communications of the ACM, 30, 6, pp. 520-540, (1987)
  • [3] Ziv J, Lempel A., A universal algorithm for sequential data compression, IEEE Transactions on Information Theory, 23, 3, pp. 337-343, (1977)
  • [4] Strang G, Nguyen T., Wavelets and filter banks, (1996)
  • [5] Ahmed N, Natarajan T, Rao K R., Discrete cosine transform, IEEE Transactions on Computers, C-23, 1, pp. 90-93, (1974)
  • [6] Gong Z H, Rogers T, Jenkins J, Et al., MLOC: multi-level layout optimization framework for compressed scientific data exploration with heterogeneous access patterns, Proceedings of the 41st International Conference on Parallel Processing, pp. 239-248, (2012)
  • [7] Iverson J, Kamath C, Karypis G., Fast and effective lossy compression algorithms for scientific datasets, Proceedings of European Conference on Parallel Processing, pp. 843-856, (2012)
  • [8] Gersho A, Gray R M., Vector quantization and signal compression, (1992)
  • [9] Lakshminarasimhan S, Shah N, Ethier S, Et al., Compressing the incompressible with ISABELA: in-situ reduction of spatio-temporal data, Proceedings of European Conference on Parallel Processing, pp. 366-379, (2011)
  • [10] Di S, Cappello F., Fast error-bounded lossy HPC data compression with SZ, Proceedings of the IEEE International Parallel and Distributed Processing Symposium, pp. 730-739, (2016)