CoDDA: A Flexible Copula-based Distribution Driven Analysis Framework for Large-Scale Multivariate Data

被引:12
|
作者
Hazarika, Subhashis [1 ]
Dutta, Soumya [1 ]
Shen, Han-Wei [1 ]
Chen, Jen-Ping [2 ]
机构
[1] Ohio State Univ, GRAVITY Res Grp, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Dept Mech & Aerosp Engn, Columbus, OH 43210 USA
关键词
In situ processing; Distribution-based; Multivariate; Query-driven; Copula; NONPARAMETRIC MODELS; VISUALIZATION; UNCERTAINTY; VARIABILITY;
D O I
10.1109/TVCG.2018.2864801
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
CoDDA (Copula-based Distribution Driven Analysis) is a flexible framework for large-scale multivariate datasets. A common strategy to deal with large-scale scientific simulation data is to partition the simulation domain and create statistical data summaries. Instead of storing the high-resolution raw data from the simulation, storing the compact statistical data summaries results in reduced storage overhead and alleviated I/O bottleneck. Such summaries, often represented in the form of statistical probability distributions, can serve various post-hoc analysis and visualization tasks. However, for multivariate simulation data using standard multivariate distributions for creating data summaries is not feasible. They are either storage inefficient or are computationally expensive to be estimated in simulation time (in situ) for large number of variables. In this work, using copula functions, we propose a flexible multivariate distribution-based data modeling and analysis framework that offers significant data reduction and can be used in an in situ environment. The framework also facilitates in storing the associated spatial information along with the multivariate distributions in an efficient representation. Using the proposed multivariate data summaries, we perform various multivariate post-hoc analyses like query-driven visualization and sampling-based visualization. We evaluate our proposed method on multiple real-world multivariate scientific datasets. To demonstrate the efficacy of our framework in an in situ environment, we apply it on a large-scale flow simulation.
引用
收藏
页码:1214 / 1224
页数:11
相关论文
共 50 条
  • [21] A Flexible Copula-based Approach for the Analysis of Secondary Phenotypes in Ascertained Samples
    Oualkacha, Karim
    Tounkara, Fode
    Lefebvre, Genevieve
    Greenwood, Celia M. T.
    GENETIC EPIDEMIOLOGY, 2019, 43 (07) : 902 - 903
  • [22] Multivariate Copula-Based Joint Probability Distribution of Water Supply and Demand in Irrigation District
    Jinping Zhang
    Xiaomin Lin
    Bingtuo Guo
    Water Resources Management, 2016, 30 : 2361 - 2375
  • [23] Copula-based analysis of multivariate dependence patterns between dimensions of poverty in Europe
    Garcia-Gomez, Cesar
    Perez, Ana
    Prieto-Alaiz, Mercedes
    REVIEW OF INCOME AND WEALTH, 2021, 67 (01) : 165 - 195
  • [24] A copula-based multivariate flood frequency analysis under climate change effects
    Khajehali, Marzieh
    Safavi, Hamid R.
    Nikoo, Mohammad Reza
    Najafi, Mohammad Reza
    Alizadeh-Sh, Reza
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [25] Copula-Based Data-Driven Multiple-Point Simulation Method
    Sohrabian, Babak
    Tercan, Abdullah Erhan
    SPATIAL STATISTICS, 2024, 59
  • [26] A Copula-Based GLMM Model for Multivariate Longitudinal Data with Mixed-Types of Responses
    Zhang, Weiping
    Zhang, MengMeng
    Chen, Yu
    SANKHYA-SERIES B-APPLIED AND INTERDISCIPLINARY STATISTICS, 2020, 82 (02): : 353 - 379
  • [27] A Copula-based Sampling Method for Data-driven Prognostics and Health Management
    Xi, Zhimin
    Jing, Rong
    Wang, Pingfeng
    Hu, Chao
    PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2013, VOL 3A, 2014,
  • [28] A Copula-based Sampling Method for Data-driven Prognostics and Health Management
    Xi, Zhimin
    Jing, Rong
    Wang, Pingfeng
    Hu, Chao
    2013 IEEE INTERNATIONAL CONFERENCE ON PROGNOSTICS AND HEALTH MANAGEMENT, 2013,
  • [29] A Copula-Based GLMM Model for Multivariate Longitudinal Data with Mixed-Types of Responses
    Weiping Zhang
    MengMeng Zhang
    Yu Chen
    Sankhya B, 2020, 82 : 353 - 379
  • [30] A data-driven layout optimization framework of large-scale wind farms based on machine learning
    Yang, Kun
    Deng, Xiaowei
    Ti, Zilong
    Yang, Shanghui
    Huang, Senbin
    Wang, Yuhang
    RENEWABLE ENERGY, 2023, 218