Black-box statistical prediction of lossy compression ratios for scientific data

被引:2
|
作者
Underwood, Robert [1 ,7 ]
Bessac, Julie [4 ]
Krasowska, David [5 ]
Calhoun, Jon C. [6 ]
Di, Sheng [2 ]
Cappello, Franck [3 ]
机构
[1] Argonne Natl Lab, Math & Comp Sci Div, Lemont, IL USA
[2] Argonne Natl Lab, Math & Comp Sci MCS Div, Lemont, IL USA
[3] Argonne Natl Lab, Lemont, IL USA
[4] Natl Renewable Energy Lab, Golden, CO USA
[5] Northwestern Univ, Evanston, IL USA
[6] Clemson Univ, Holcombe Dept Elect & Comp Engn, Clemson, SC USA
[7] Argonne Natl Lab, Dept Math & Comp Sci, 9700 S Cass Ave, Lemont, IL 60439 USA
基金
美国国家科学基金会;
关键词
Scientific data; data reduction; lossy compression; high-performance applications; data storage and movements; MULTILEVEL TECHNIQUES; REDUCTION;
D O I
10.1177/10943420231179417
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Lossy compressors are increasingly adopted in scientific research, tackling volumes of data from experiments or parallel numerical simulations and facilitating data storage and movement. In contrast with the notion of entropy in lossless compression, no theoretical or data-based quantification of lossy compressibility exists for scientific data. Users rely on trial and error to assess lossy compression performance. As a strong data-driven effort toward quantifying lossy compressibility of scientific datasets, we provide a statistical framework to predict compression ratios of lossy compressors. Our method is a two-step framework where (i) compressor-agnostic predictors are computed and (ii) statistical prediction models relying on these predictors are trained on observed compression ratios. Proposed predictors exploit spatial correlations and notions of entropy and lossyness via the quantized entropy. We study 8+ compressors on 6 scientific datasets and achieve a median percentage prediction error less than 12%, which is substantially smaller than that of other methods while achieving at least a 8.8x speedup for searching for a specific compression ratio and 7.8x speedup for determining the best compressor out of a collection.
引用
收藏
页码:412 / 433
页数:22
相关论文
共 50 条
  • [31] Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data
    Lu, Tao
    Liu, Qing
    He, Xubin
    Luo, Huizhang
    Suchyta, Eric
    Choi, Jong
    Podhorszki, Norbert
    Klasky, Scott
    Wolf, Mathew
    Liu, Tong
    Qiao, Zhenbo
    2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 348 - 357
  • [32] Improving Performance of Data Dumping with Lossy Compression for Scientific Simulation
    Liang, Xin
    Di, Sheng
    Tao, Dingwen
    Li, Sihuan
    Nicolae, Bogdan
    Chen, Zizhong
    Cappello, Franck
    2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 340 - 350
  • [33] Lossy compression of matrices by black box optimisation of mixed integer nonlinear programming
    Tadashi Kadowaki
    Mitsuru Ambai
    Scientific Reports, 12
  • [34] Lossy compression of matrices by black box optimisation of mixed integer nonlinear programming
    Kadowaki, Tadashi
    Ambai, Mitsuru
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [35] Spanning attack: reinforce black-box attacks with unlabeled data
    Lu Wang
    Huan Zhang
    Jinfeng Yi
    Cho-Jui Hsieh
    Yuan Jiang
    Machine Learning, 2020, 109 : 2349 - 2368
  • [36] Black-box correctness tests for basic parallel data structures
    Gibbons, PB
    Bruno, JL
    Phillips, S
    THEORY OF COMPUTING SYSTEMS, 2002, 35 (04) : 391 - 432
  • [37] Black-box Detection of Backdoor Attacks with Limited Information and Data
    Dong, Yinpeng
    Yang, Xiao
    Deng, Zhijie
    Pang, Tianyu
    Xiao, Zihao
    Su, Hang
    Zhu, Jun
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16462 - 16471
  • [38] Statistical mechanical approach to lossy data compression: Theory and practice
    Hosaka, T
    Kabashima, Y
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2006, 365 (01) : 113 - 119
  • [39] Data Synthesis for Testing Black-Box Machine Learning Models
    Saha, Diptikalyan
    Aggarwal, Aniya
    Hans, Sandeep
    PROCEEDINGS OF THE 5TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA, CODS COMAD 2022, 2022, : 110 - 114
  • [40] Spanning attack: reinforce black-box attacks with unlabeled data
    Wang, Lu
    Zhang, Huan
    Yi, Jinfeng
    Hsieh, Cho-Jui
    Jiang, Yuan
    MACHINE LEARNING, 2020, 109 (12) : 2349 - 2368