Black-box statistical prediction of lossy compression ratios for scientific data

被引:2
|
作者
Underwood, Robert [1 ,7 ]
Bessac, Julie [4 ]
Krasowska, David [5 ]
Calhoun, Jon C. [6 ]
Di, Sheng [2 ]
Cappello, Franck [3 ]
机构
[1] Argonne Natl Lab, Math & Comp Sci Div, Lemont, IL USA
[2] Argonne Natl Lab, Math & Comp Sci MCS Div, Lemont, IL USA
[3] Argonne Natl Lab, Lemont, IL USA
[4] Natl Renewable Energy Lab, Golden, CO USA
[5] Northwestern Univ, Evanston, IL USA
[6] Clemson Univ, Holcombe Dept Elect & Comp Engn, Clemson, SC USA
[7] Argonne Natl Lab, Dept Math & Comp Sci, 9700 S Cass Ave, Lemont, IL 60439 USA
基金
美国国家科学基金会;
关键词
Scientific data; data reduction; lossy compression; high-performance applications; data storage and movements; MULTILEVEL TECHNIQUES; REDUCTION;
D O I
10.1177/10943420231179417
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Lossy compressors are increasingly adopted in scientific research, tackling volumes of data from experiments or parallel numerical simulations and facilitating data storage and movement. In contrast with the notion of entropy in lossless compression, no theoretical or data-based quantification of lossy compressibility exists for scientific data. Users rely on trial and error to assess lossy compression performance. As a strong data-driven effort toward quantifying lossy compressibility of scientific datasets, we provide a statistical framework to predict compression ratios of lossy compressors. Our method is a two-step framework where (i) compressor-agnostic predictors are computed and (ii) statistical prediction models relying on these predictors are trained on observed compression ratios. Proposed predictors exploit spatial correlations and notions of entropy and lossyness via the quantized entropy. We study 8+ compressors on 6 scientific datasets and achieve a median percentage prediction error less than 12%, which is substantially smaller than that of other methods while achieving at least a 8.8x speedup for searching for a specific compression ratio and 7.8x speedup for determining the best compressor out of a collection.
引用
收藏
页码:412 / 433
页数:22
相关论文
共 50 条
  • [21] Black-box Test Data Generation for GUI Testing
    Darvish, Ali
    Chang, Carl K.
    2014 14TH INTERNATIONAL CONFERENCE ON QUALITY SOFTWARE (QSIC 2014), 2014, : 133 - 138
  • [22] Engineering the Black-Box Meta Model of Data Exploration
    Winter, Robert
    Yang, Li
    ADVANCES IN ENTERPRISE ENGINEERING XIII, EEWC 2019, 2020, 374 : 85 - 101
  • [23] Black-box Concurrent Data Structures for NUMA Architectures
    Calciu, Irina
    Sen, Siddhartha
    Balakrishnan, Mahesh
    Aguilera, Marcos K.
    TWENTY-SECOND INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXII), 2017, : 206 - 220
  • [24] On the clinical acceptance of black-box systems for EEG seizure prediction
    Pinto, Mauro F.
    Leal, Adriana
    Lopes, Fabio
    Pais, Jose
    Dourado, Antonio
    Sales, Francisco
    Martins, Pedro
    Teixeira, Cesar A.
    EPILEPSIA OPEN, 2022, 7 (02) : 247 - 259
  • [25] Inspecting Prediction Confidence for Detecting Black-Box Backdoor Attacks
    Wang, Tong
    Yao, Yuan
    Xu, Feng
    Xu, Miao
    An, Shengwei
    Wang, Ting
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 1, 2024, : 274 - 282
  • [26] Black-box modelling approaches for the prediction of microbiological bacterial growth
    Poli, Cecilia
    Pietrabissa, Antonio
    PROCEEDINGS OF THE 2006 IEEE INTERNATIONAL CONFERENCE ON CONTROL APPLICATIONS, VOLS 1-4, 2006, : 2123 - +
  • [27] BLACK-BOX COLLISION ATTACKS ON THE COMPRESSION FUNCTION OF THE GOST HASH FUNCTION
    Courtois, Nicolas T.
    Mourouzis, Theodosis
    SECRYPT 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY, 2011, : 325 - 332
  • [28] A Black-Box Fork-Join Latency Prediction Model for Data-Intensive Applications
    Nguyen, Minh
    Alesawi, Sami
    Li, Ning
    Che, Hao
    Jiang, Hong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (09) : 1983 - 2000
  • [29] Discovering Unexpected Local Nonlinear Interactions in Scientific Black-box Models
    Doron, Michael
    Segev, Idan
    Shahaf, Dafna
    KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 425 - 435
  • [30] Black-box and Gray-box Components as Elements for Performance Prediction in Telecommunications System
    Skuliber, I.
    Huljenic, D.
    Desic, S.
    CONTEL 2009: PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS, 2009, : 131 - 134