Black-box statistical prediction of lossy compression ratios for scientific data

被引：2

作者：

Underwood, Robert ^{[1
,7
]}

Bessac, Julie ^{[4
]}

Krasowska, David ^{[5
]}

Calhoun, Jon C. ^{[6
]}

Di, Sheng ^{[2
]}

Cappello, Franck ^{[3
]}

机构：

[1] Argonne Natl Lab, Math & Comp Sci Div, Lemont, IL USA

[2] Argonne Natl Lab, Math & Comp Sci MCS Div, Lemont, IL USA

[3] Argonne Natl Lab, Lemont, IL USA

[4] Natl Renewable Energy Lab, Golden, CO USA

[5] Northwestern Univ, Evanston, IL USA

[6] Clemson Univ, Holcombe Dept Elect & Comp Engn, Clemson, SC USA

[7] Argonne Natl Lab, Dept Math & Comp Sci, 9700 S Cass Ave, Lemont, IL 60439 USA

来源：

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS | 2023年 / 37卷 / 3-4期

基金：

美国国家科学基金会;

关键词：

Scientific data; data reduction; lossy compression; high-performance applications; data storage and movements; MULTILEVEL TECHNIQUES; REDUCTION;

D O I：

10.1177/10943420231179417

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Lossy compressors are increasingly adopted in scientific research, tackling volumes of data from experiments or parallel numerical simulations and facilitating data storage and movement. In contrast with the notion of entropy in lossless compression, no theoretical or data-based quantification of lossy compressibility exists for scientific data. Users rely on trial and error to assess lossy compression performance. As a strong data-driven effort toward quantifying lossy compressibility of scientific datasets, we provide a statistical framework to predict compression ratios of lossy compressors. Our method is a two-step framework where (i) compressor-agnostic predictors are computed and (ii) statistical prediction models relying on these predictors are trained on observed compression ratios. Proposed predictors exploit spatial correlations and notions of entropy and lossyness via the quantized entropy. We study 8+ compressors on 6 scientific datasets and achieve a median percentage prediction error less than 12%, which is substantially smaller than that of other methods while achieving at least a 8.8x speedup for searching for a specific compression ratio and 7.8x speedup for determining the best compressor out of a collection.

引用

页码：412 / 433

页数：22

共 50 条

[31] Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data
Lu, Tao
Liu, Qing
He, Xubin
Luo, Huizhang
Suchyta, Eric
Choi, Jong
Podhorszki, Norbert
Klasky, Scott
Wolf, Mathew
Liu, Tong
Qiao, Zhenbo
2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 348 - 357
[32] Improving Performance of Data Dumping with Lossy Compression for Scientific Simulation
Liang, Xin
Di, Sheng
Tao, Dingwen
Li, Sihuan
Nicolae, Bogdan
Chen, Zizhong
Cappello, Franck
2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 340 - 350
[33] Lossy compression of matrices by black box optimisation of mixed integer nonlinear programming
Tadashi Kadowaki
Mitsuru Ambai
Scientific Reports, 12
[34] Lossy compression of matrices by black box optimisation of mixed integer nonlinear programming
Kadowaki, Tadashi
Ambai, Mitsuru
SCIENTIFIC REPORTS, 2022, 12 (01)
[35] Spanning attack: reinforce black-box attacks with unlabeled data
Lu Wang
Huan Zhang
Jinfeng Yi
Cho-Jui Hsieh
Yuan Jiang
Machine Learning, 2020, 109 : 2349 - 2368
[36] Black-box correctness tests for basic parallel data structures
Gibbons, PB
Bruno, JL
Phillips, S
THEORY OF COMPUTING SYSTEMS, 2002, 35 (04) : 391 - 432
[37] Black-box Detection of Backdoor Attacks with Limited Information and Data
Dong, Yinpeng
Yang, Xiao
Deng, Zhijie
Pang, Tianyu
Xiao, Zihao
Su, Hang
Zhu, Jun
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16462 - 16471
[38] Statistical mechanical approach to lossy data compression: Theory and practice
Hosaka, T
Kabashima, Y
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2006, 365 (01) : 113 - 119
[39] Data Synthesis for Testing Black-Box Machine Learning Models
Saha, Diptikalyan
Aggarwal, Aniya
Hans, Sandeep
PROCEEDINGS OF THE 5TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA, CODS COMAD 2022, 2022, : 110 - 114
[40] Spanning attack: reinforce black-box attacks with unlabeled data
Wang, Lu
Zhang, Huan
Yi, Jinfeng
Hsieh, Cho-Jui
Jiang, Yuan
MACHINE LEARNING, 2020, 109 (12) : 2349 - 2368

← 1 2 3 4 5 →