Prov-Dominoes: An approach for knowledge discovery from provenance data

被引:0
|
作者
Alencar, Victor [1 ]
Kohwalter, Troy [2 ]
Braganholo, Vanessa [2 ]
Da Silva Junior, Jose Ricardo [3 ,4 ]
Murta, Leonardo [2 ]
机构
[1] CASNAV, Brazilian Navy, Rio De Janeiro, RJ, Brazil
[2] Univ Fed Fluminense, Inst Computacao, Niteroi, RJ, Brazil
[3] IFRJ, Dept Computacao, Niteroi, RJ, Brazil
[4] Inst Fed Rio Janeiro, Niteroi, RJ, Brazil
关键词
Knowledge discovery; Data analysis; Provenance; Gpu computing; VISUALIZATION; MODEL;
D O I
10.1016/j.eswa.2023.123030
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Provenance has become increasingly relevant to understanding, auditing, and reproducing computational tasks. The provenance analysis processes can often be overwhelming to the user due to the large volume of data, the multiple relationships among data, and the implicit information buried in the data. Existing provenance analysis tools use either visual exploration (which is overwhelming for large provenance graphs) or do not support the exploration of implicit provenance data, such as the inferences of the PROV Data Model Constraints. To fill in this gap, we introduce Prov-Dominoes, a tool designed to interactively enable knowledge discovery on provenance data. Prov-Dominoes promotes the provenance relationships among entities, activities, and agents into first-class elements represented by domino tiles. It allows users to combine and compose such domino tiles visually and interactively, using GPU. The benefits of Prov-Dominoes are three-fold: first, it uses matrices to display provenance data, which is more compact than graphs; second, it allows users to easily explore implicit information; third, it is capable of efficiently processing large datasets using GPUs. We evaluated Prov-Dominoes over distinct case studies, allowing the observation of Prov-Dominoes in action. We also evaluated the performance of sequential combinations executed in Prov-Dominoes when dealing with provenance data with thousands of relations, contrasting their executions in GPU and CPU. The results showed that, for a large dataset, GPU was more than a hundred times faster than CPU.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Supervised knowledge discovery from incomplete data
    Kalousis, A
    Hilario, M
    DATA MINING II, 2000, 2 : 269 - 278
  • [42] Knowledge discovery from transportation network data
    Jiang, W
    Vaidya, J
    Balaporia, Z
    Clifton, C
    Banich, B
    ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 1061 - 1072
  • [43] From base data to knowledge discovery – A life cycle approach – Using multilayer networks
    Santra A.
    Komar K.
    Bhowmick S.
    Chakravarthy S.
    Data and Knowledge Engineering, 2022, 141
  • [44] Interval-valued fuzzy predicates from labeled data: An approach to data classification and knowledge discovery
    Comas, Diego S.
    Meschino, Gustavo J.
    Ballarin, Virginia L.
    INFORMATION SCIENCES, 2025, 707
  • [45] Discovery of process models from data and domain knowledge: A rough-granular approach
    Skowron, Andrzej
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2007, 4815 : 192 - 197
  • [46] Provenance Network Analytics An approach to data analytics using data provenance
    Trung Dong Huynh
    Ebden, Mark
    Fischer, Joel
    Roberts, Stephen
    Moreau, Luc
    DATA MINING AND KNOWLEDGE DISCOVERY, 2018, 32 (03) : 708 - 735
  • [47] Data Warehouse Design For Knowledge Discovery From Healthcare Data
    Ahmed, Aftab
    Zafar, Kashif
    Siddiqui, Abdul Basit
    Abdullah, Umair
    WORLD CONGRESS ON ENGINEERING - WCE 2013, VOL III, 2013, : 1589 - +
  • [48] Knowledge discovery in bridge monitoring data: A soft computing approach
    Lubasch, Peer
    Schnellenbach-Held, Martina
    Freischlad, Mark
    Buschmeyer, Wilhelm
    INTELLIGENT COMPUTING IN ENGINEERING AND ARCHITECTURE, 2006, 4200 : 428 - 436
  • [49] Provenance Network AnalyticsAn approach to data analytics using data provenance
    Trung Dong Huynh
    Mark Ebden
    Joel Fischer
    Stephen Roberts
    Luc Moreau
    Data Mining and Knowledge Discovery, 2018, 32 : 708 - 735
  • [50] PROV-IO+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems
    Han, Runzhou
    Zheng, Mai
    Byna, Suren
    Tang, Houjun
    Dong, Bin
    Dai, Dong
    Chen, Yong
    Kim, Dongkyun
    Hassoun, Joseph
    Thorsley, David
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (05) : 844 - 861