CoDDA: A Flexible Copula-based Distribution Driven Analysis Framework for Large-Scale Multivariate Data

被引:12
|
作者
Hazarika, Subhashis [1 ]
Dutta, Soumya [1 ]
Shen, Han-Wei [1 ]
Chen, Jen-Ping [2 ]
机构
[1] Ohio State Univ, GRAVITY Res Grp, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Dept Mech & Aerosp Engn, Columbus, OH 43210 USA
关键词
In situ processing; Distribution-based; Multivariate; Query-driven; Copula; NONPARAMETRIC MODELS; VISUALIZATION; UNCERTAINTY; VARIABILITY;
D O I
10.1109/TVCG.2018.2864801
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
CoDDA (Copula-based Distribution Driven Analysis) is a flexible framework for large-scale multivariate datasets. A common strategy to deal with large-scale scientific simulation data is to partition the simulation domain and create statistical data summaries. Instead of storing the high-resolution raw data from the simulation, storing the compact statistical data summaries results in reduced storage overhead and alleviated I/O bottleneck. Such summaries, often represented in the form of statistical probability distributions, can serve various post-hoc analysis and visualization tasks. However, for multivariate simulation data using standard multivariate distributions for creating data summaries is not feasible. They are either storage inefficient or are computationally expensive to be estimated in simulation time (in situ) for large number of variables. In this work, using copula functions, we propose a flexible multivariate distribution-based data modeling and analysis framework that offers significant data reduction and can be used in an in situ environment. The framework also facilitates in storing the associated spatial information along with the multivariate distributions in an efficient representation. Using the proposed multivariate data summaries, we perform various multivariate post-hoc analyses like query-driven visualization and sampling-based visualization. We evaluate our proposed method on multiple real-world multivariate scientific datasets. To demonstrate the efficacy of our framework in an in situ environment, we apply it on a large-scale flow simulation.
引用
收藏
页码:1214 / 1224
页数:11
相关论文
共 50 条
  • [31] Data Provenance in Large-Scale Distribution
    Zhu, Yunan
    Che, Wei
    Shan, Chao
    Zhao, Shen
    ARTIFICIAL INTELLIGENCE AND SECURITY, ICAIS 2022, PT III, 2022, 13340 : 28 - 42
  • [32] NetQuest: A Flexible Framework for Large-Scale Network Measurement
    Song, Han Hee
    Qiu, Lili
    Zhang, Yin
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2009, 17 (01) : 106 - 119
  • [33] Uncertainty Visualization Using Copula-Based Analysis in Mixed Distribution Models
    Hazarika, Subhashis
    Biswas, Ayan
    Shen, Han-Wei
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2018, 24 (01) : 934 - 943
  • [34] Copula-based Analysis of Spatial Distribution Characteristics of Drought in Songnen Grassland
    Wu, Rina
    Zhang, Jiquan
    Lai, Quan
    Tong, Siqin
    PROCEEDINGS OF THE 7TH ANNUAL MEETING OF RISK ANALYSIS COUNCIL OF CHINA ASSOCIATION FOR DISASTER PREVENTION, 2016, 128 : 263 - 269
  • [35] Copula-Based Multivariate Frequency Analysis of the 2012-2018 Drought in Northeast Brazil
    Pontes Filho, Joao Dehon
    Souza Filho, Francisco de Assis
    Passos Rodrigues Martins, Eduardo Savio
    de Carvalho Studart, Ticiana Marinho
    WATER, 2020, 12 (03)
  • [36] Multivariate Extreme Wind Loads: Copula-Based Analysis (vol 149, 100129, 2023)
    Ji, Xiaowen
    JOURNAL OF ENGINEERING MECHANICS, 2023, 149 (09)
  • [37] Multivariate analysis of traffic flow using copula-based model at an isolated road intersection
    Fang, Zhenyuan
    Zhu, Shichao
    Fu, Xin
    Liu, Fang
    Huang, Helai
    Tang, Jinjun
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2022, 599
  • [38] Multivariate analysis of concurrent droughts and their effects on Kharif crops-A copula-based approach
    Muthuvel, Dineshkumar
    Amai, Mahesha
    INTERNATIONAL JOURNAL OF CLIMATOLOGY, 2022, 42 (05) : 2773 - 2794
  • [39] Finding needles in large-scale multivariate data haystacks
    Ward, M
    IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2004, 24 (05) : 16 - 19
  • [40] A copula-based Markov chain model for the analysis of binary longitudinal data
    Escarela, Gabriel
    Carlos Perez-Ruiz, Luis
    Bowater, Russell J.
    JOURNAL OF APPLIED STATISTICS, 2009, 36 (06) : 647 - 657