CoDDA: A Flexible Copula-based Distribution Driven Analysis Framework for Large-Scale Multivariate Data

被引:12
|
作者
Hazarika, Subhashis [1 ]
Dutta, Soumya [1 ]
Shen, Han-Wei [1 ]
Chen, Jen-Ping [2 ]
机构
[1] Ohio State Univ, GRAVITY Res Grp, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Dept Mech & Aerosp Engn, Columbus, OH 43210 USA
关键词
In situ processing; Distribution-based; Multivariate; Query-driven; Copula; NONPARAMETRIC MODELS; VISUALIZATION; UNCERTAINTY; VARIABILITY;
D O I
10.1109/TVCG.2018.2864801
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
CoDDA (Copula-based Distribution Driven Analysis) is a flexible framework for large-scale multivariate datasets. A common strategy to deal with large-scale scientific simulation data is to partition the simulation domain and create statistical data summaries. Instead of storing the high-resolution raw data from the simulation, storing the compact statistical data summaries results in reduced storage overhead and alleviated I/O bottleneck. Such summaries, often represented in the form of statistical probability distributions, can serve various post-hoc analysis and visualization tasks. However, for multivariate simulation data using standard multivariate distributions for creating data summaries is not feasible. They are either storage inefficient or are computationally expensive to be estimated in simulation time (in situ) for large number of variables. In this work, using copula functions, we propose a flexible multivariate distribution-based data modeling and analysis framework that offers significant data reduction and can be used in an in situ environment. The framework also facilitates in storing the associated spatial information along with the multivariate distributions in an efficient representation. Using the proposed multivariate data summaries, we perform various multivariate post-hoc analyses like query-driven visualization and sampling-based visualization. We evaluate our proposed method on multiple real-world multivariate scientific datasets. To demonstrate the efficacy of our framework in an in situ environment, we apply it on a large-scale flow simulation.
引用
收藏
页码:1214 / 1224
页数:11
相关论文
共 50 条
  • [41] Copula-based semiparametric analysis for time series data with detection limits
    Li, Fuyuan
    Tang, Yanlin
    Wang, Huixia Judy
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2019, 47 (03): : 438 - 454
  • [42] Risk Optimization for Revenue-Driven Wireless Video Broadcasting Systems: A Copula-Based Framework
    Ji, Wen
    Poor, H. Vincent
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1757 - 1771
  • [43] A COPULA-DRIVEN UNSUPERVISED LEARNING FRAMEWORK FOR ANOMALY DETECTION WITH MULTIVARIATE HETEROGENEOUS DATA
    Damodaran, Swaroop
    Padmanabhan, Ram
    Maahin, R.
    Gurugopinath, Sanjeev
    2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,
  • [44] Water shortage risk assessment considering large-scale regional transfers: a copula-based uncertainty case study in Lunan, China
    Xueping Gao
    Yinzhu Liu
    Bowen Sun
    Environmental Science and Pollution Research, 2018, 25 : 23328 - 23341
  • [45] Data-driven framework for large-scale prediction of charging energy in electric vehicles
    Zhao, Yang
    Wang, Zhenpo
    Shen, Zuo-Jun Max
    Sun, Fengchun
    APPLIED ENERGY, 2021, 282
  • [46] Water shortage risk assessment considering large-scale regional transfers: a copula-based uncertainty case study in Lunan, China
    Gao, Xueping
    Liu, Yinzhu
    Sun, Bowen
    ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2018, 25 (23) : 23328 - 23341
  • [47] ADR Visualization: A Generalized Framework for Ranking Large-Scale Scientific Data using Analysis-Driven Refinement
    Nouanesengsy, Boonthanome
    Woodring, Jonathan
    Patchett, John
    Myers, Kary
    Ahrens, James
    2014 IEEE 4TH SYMPOSIUM ON LARGE DATA ANALYSIS AND VISUALIZATION (LDAV), 2014, : 43 - 50
  • [48] A classification based framework for quantitative description of large-scale microarray data
    Sangurdekar, Dipen P.
    Srienc, Friedrich
    Khodursky, Arkady B.
    GENOME BIOLOGY, 2006, 7 (04)
  • [49] An elastic framework for ensemble-based large-scale data assimilation
    Friedemann, Sebastian
    Raffin, Bruno
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2022, 36 (04): : 543 - 563
  • [50] A classification based framework for quantitative description of large-scale microarray data
    Dipen P Sangurdekar
    Friedrich Srienc
    Arkady B Khodursky
    Genome Biology, 7