Black-box statistical prediction of lossy compression ratios for scientific data

被引:2
|
作者
Underwood, Robert [1 ,7 ]
Bessac, Julie [4 ]
Krasowska, David [5 ]
Calhoun, Jon C. [6 ]
Di, Sheng [2 ]
Cappello, Franck [3 ]
机构
[1] Argonne Natl Lab, Math & Comp Sci Div, Lemont, IL USA
[2] Argonne Natl Lab, Math & Comp Sci MCS Div, Lemont, IL USA
[3] Argonne Natl Lab, Lemont, IL USA
[4] Natl Renewable Energy Lab, Golden, CO USA
[5] Northwestern Univ, Evanston, IL USA
[6] Clemson Univ, Holcombe Dept Elect & Comp Engn, Clemson, SC USA
[7] Argonne Natl Lab, Dept Math & Comp Sci, 9700 S Cass Ave, Lemont, IL 60439 USA
基金
美国国家科学基金会;
关键词
Scientific data; data reduction; lossy compression; high-performance applications; data storage and movements; MULTILEVEL TECHNIQUES; REDUCTION;
D O I
10.1177/10943420231179417
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Lossy compressors are increasingly adopted in scientific research, tackling volumes of data from experiments or parallel numerical simulations and facilitating data storage and movement. In contrast with the notion of entropy in lossless compression, no theoretical or data-based quantification of lossy compressibility exists for scientific data. Users rely on trial and error to assess lossy compression performance. As a strong data-driven effort toward quantifying lossy compressibility of scientific datasets, we provide a statistical framework to predict compression ratios of lossy compressors. Our method is a two-step framework where (i) compressor-agnostic predictors are computed and (ii) statistical prediction models relying on these predictors are trained on observed compression ratios. Proposed predictors exploit spatial correlations and notions of entropy and lossyness via the quantized entropy. We study 8+ compressors on 6 scientific datasets and achieve a median percentage prediction error less than 12%, which is substantially smaller than that of other methods while achieving at least a 8.8x speedup for searching for a specific compression ratio and 7.8x speedup for determining the best compressor out of a collection.
引用
收藏
页码:412 / 433
页数:22
相关论文
共 50 条
  • [1] On the physical interpretation of statistical data from black-box systems
    Eliazar, Iddo I.
    Cohen, Morrel H.
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2013, 392 (13) : 2924 - 2939
  • [2] SCIENTIFIC ATTITUDES, CERTAINTY, AND THE BLACK-BOX
    BERNATOWICZ, AJ
    KAY, EA
    JOURNAL OF GENERAL EDUCATION, 1961, 13 (01): : 25 - 29
  • [3] Auditing Black-Box Prediction Models for Data Minimization Compliance
    Rastegarpanah, Bashir
    Gummadi, Krishna P.
    Crovella, Mark
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] Lossy Scientific Data Compression With SPERR
    Li, Shaomeng
    Lindstrom, Peter
    Clyne, John
    2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 1007 - 1017
  • [5] zPerf: A Statistical Gray-Box Approach to Performance Modeling and Extrapolation for Scientific Lossy Compression
    Wang, Jinzhen
    Chen, Qi
    Liu, Tong
    Liu, Qing
    He, Xubin
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (09) : 2641 - 2655
  • [6] "Black-box" Data as a New Paradigm
    El-Samad, Hana
    GEN BIOTECHNOLOGY, 2024, 3 (02): : 47 - 48
  • [7] Using Black-Box Compression Algorithms for Phase Retrieval
    Bakhshizadeh, Milad
    Maleki, Arian
    Jalali, Shirin
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2020, 66 (12) : 7978 - 8001
  • [8] Statistical model checking of black-box probabilistic systems
    Sen, K
    Viswanathan, M
    Agha, G
    COMPUTER AIDED VERIFICATION, 2004, 3114 : 202 - 215
  • [9] Black-Box Optimization Using Geodesics in Statistical Manifolds
    Bensadon, Jeremy
    ENTROPY, 2015, 17 (01): : 304 - 345
  • [10] Black-Box Data Poisoning Attacks on Crowdsourcing
    Chen, Pengpeng
    Yang, Yongqiang
    Yang, Dingqi
    Sun, Hailong
    Chen, Zhijun
    Lin, Peng
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 2975 - 2983