Black-box statistical prediction of lossy compression ratios for scientific data

被引:2
|
作者
Underwood, Robert [1 ,7 ]
Bessac, Julie [4 ]
Krasowska, David [5 ]
Calhoun, Jon C. [6 ]
Di, Sheng [2 ]
Cappello, Franck [3 ]
机构
[1] Argonne Natl Lab, Math & Comp Sci Div, Lemont, IL USA
[2] Argonne Natl Lab, Math & Comp Sci MCS Div, Lemont, IL USA
[3] Argonne Natl Lab, Lemont, IL USA
[4] Natl Renewable Energy Lab, Golden, CO USA
[5] Northwestern Univ, Evanston, IL USA
[6] Clemson Univ, Holcombe Dept Elect & Comp Engn, Clemson, SC USA
[7] Argonne Natl Lab, Dept Math & Comp Sci, 9700 S Cass Ave, Lemont, IL 60439 USA
基金
美国国家科学基金会;
关键词
Scientific data; data reduction; lossy compression; high-performance applications; data storage and movements; MULTILEVEL TECHNIQUES; REDUCTION;
D O I
10.1177/10943420231179417
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Lossy compressors are increasingly adopted in scientific research, tackling volumes of data from experiments or parallel numerical simulations and facilitating data storage and movement. In contrast with the notion of entropy in lossless compression, no theoretical or data-based quantification of lossy compressibility exists for scientific data. Users rely on trial and error to assess lossy compression performance. As a strong data-driven effort toward quantifying lossy compressibility of scientific datasets, we provide a statistical framework to predict compression ratios of lossy compressors. Our method is a two-step framework where (i) compressor-agnostic predictors are computed and (ii) statistical prediction models relying on these predictors are trained on observed compression ratios. Proposed predictors exploit spatial correlations and notions of entropy and lossyness via the quantized entropy. We study 8+ compressors on 6 scientific datasets and achieve a median percentage prediction error less than 12%, which is substantially smaller than that of other methods while achieving at least a 8.8x speedup for searching for a specific compression ratio and 7.8x speedup for determining the best compressor out of a collection.
引用
收藏
页码:412 / 433
页数:22
相关论文
共 50 条
  • [41] Black-Box Data-efficient Policy Search for Robotics
    Chatzilygeroudis, Konstantinos
    Rama, Roberto
    Kaushik, Rituraj
    Goepp, Dorian
    Vassiliades, Vassilis
    Mouret, Jean-Baptiste
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 51 - 58
  • [42] On the amount of statistical side information required for lossy data compression
    Merhav, N
    Ziv, J
    IEEE TRANSACTIONS ON INFORMATION THEORY, 1997, 43 (04) : 1112 - 1121
  • [43] Statistical mechanics of lossy data compression using a nonmonotonic perceptron
    Hosaka, T
    Kabashima, Y
    Nishimori, H
    PHYSICAL REVIEW E, 2002, 66 (06): : 8 - 066126
  • [44] Statistical mechanical evaluation of error exponents for lossy data compression
    Hosaka, T
    Kabashima, Y
    2004 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, PROCEEDINGS, 2004, : 479 - 479
  • [45] BABOONS: Black-Box Optimization of Data Summaries in Natural Language
    Trummer, Immanuel
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (11): : 2980 - 2993
  • [46] Statistical mechanical approach to error exponents of lossy data compression
    Hosaka, T
    Kabashima, Y
    JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2005, 74 (01) : 488 - 497
  • [47] LATERAL POSITION AND INTERAURAL DISCRIMINATION - DATA AND BLACK-BOX MODEL
    DOMNITZ, RH
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1976, 59 : S23 - S23
  • [48] Black-Box Correctness Tests for Basic Parallel Data Structures
    Phillip B. Gibbons
    John L. Bruno
    Steven Phillips
    Theory of Computing Systems, 2002, 35 : 391 - 432
  • [49] Health research governance of data access: a black-box challenge
    McDonald, Paula
    Mayes, Robyn
    Frederiksen, Peter
    Malatzky, Christina
    Feldman, Alicia
    Davies, Janet M.
    Leon-Espinoza, Diana
    AUSTRALIAN HEALTH REVIEW, 2025, 49 (02)
  • [50] Towards Efficient Data Free Black-box Adversarial Attack
    Zhang, Jie
    Li, Bo
    Xu, Jianghe
    Wu, Shuang
    Ding, Shouhong
    Zhang, Lei
    Wu, Chao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15094 - 15104