Visual Bayesian Fusion to Navigate a Data Lake

被引:0
|
作者
Singh, Karamjit [1 ]
Paneri, Kaushal [1 ]
Pandey, Aditeya [1 ]
Gupta, Garima [1 ]
Sharma, Geetika [1 ]
Agarwal, Puneet [1 ]
Shroff, Gautam [1 ]
机构
[1] Tata Consultancy Serv Ltd, TCS Res, Gurgaon, India
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The evolution from traditional business intelligence to big data analytics has witnessed the emergence of 'Data Lakes' in which data is ingested in raw form rather than into traditional data warehouses. With the increasing availability of many more pieces of information about each entity of interest, e.g., a customer, often from diverse sources (socialmedia, mobility, internet-of-things), fusing, visualizing and deriving insights from such data pose a number of challenges: First, disparate datasets often lack a natural join key. Next, datasets may describe measures at different levels of granularity, e.g., individual vs. aggregate data, and finally, different datasets may be derived from physically distinct populations. Moreover, once data has been fused, queries are often an inefficient and inaccurate mechanism to derive insight from high-dimensional data. In this paper we describe iFuse, a data-fusion based visual analytics platform for navigating a data lake to derive insights. We rely on Bayesian graphical models to provide useful rudder with which to fuse and analyze disparate islands of data in a systematic manner. Our platform allows for rich interactive visualizations, querying and keyword-based search within and across datasets or models, as well as intuitive visual interfaces for value-imputation or model-based predictions. We illustrate the use of our platform in multiple scenarios, including two public data challenges as well as a real-life industry use-case involving the probabilistic fusion of datasets that lack a natural join-key.
引用
收藏
页码:987 / 994
页数:8
相关论文
共 50 条
  • [11] Integrated data analysis for fusion: A Bayesian tutorial for fusion diagnosticians
    Dinklage, Andreas
    Dreier, Heiko
    Fischer, Rainer
    Gori, Silvio
    Preuss, Roland
    von Toussaint, Udo
    BURNING PLASMA DIAGNOSTICS, 2008, 988 : 471 - 480
  • [12] BAYESIAN FINITE POPULATION IMPUTATION FOR DATA FUSION
    Reiter, Jerome P.
    STATISTICA SINICA, 2012, 22 (02) : 795 - 811
  • [13] Nonlinear Heterogeneous Bayesian Decentralized Data Fusion
    Dagan, Ofer
    Cinquini, Tycho L.
    Ahmed, Nisar R.
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 9262 - 9268
  • [14] Fusion of LWIR sensor data by Bayesian methods
    Inguva, R
    Garrison, G
    SENSOR FUSION: ARCHITECTURES, ALGORITHMS, AND APPLICATIONS II, 1998, 3376 : 161 - 174
  • [15] Sensors Data Fusion via Bayesian Filter
    Vechet, Stanislav
    Krejsa, Jiri
    Ondrousek, Vit
    PROCEEDINGS OF 14TH INTERNATIONAL POWER ELECTRONICS AND MOTION CONTROL CONFERENCE (EPE-PEMC 2010), 2010,
  • [16] Multisensor Data Fusion in Nonlinear Bayesian Filtering
    Rashid, U.
    Tuan, H. D.
    Apkarian, P.
    Kha, H. H.
    2012 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), 2012, : 351 - 354
  • [17] Development of a Bayesian Framework for Kinematic Data Fusion
    Lotti, Alessandro
    Zorzi, Stefano
    Tonelli, Daniel
    Tubaldi, Enrico
    Zonta, Daniele
    e-Journal of Nondestructive Testing, 2024, 29 (07):
  • [18] Bayesian approach for data fusion in sensor networks
    Wu, J. K.
    Wong, Y. F.
    2006 9TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION, VOLS 1-4, 2006, : 1757 - 1761
  • [19] A BAYESIAN APPROACH TO COVARIANCE ESTIMATION AND DATA FUSION
    Weng, Zhiyuan
    Djuric, Petar M.
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2352 - 2356
  • [20] Bayesian Data Fusion for Pipeline Leak Detection
    Guerriero, Marco
    Wheeler, Fred
    Koste, Glen
    Dekate, Sachin
    Choudhury, Niloy
    2016 19TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2016, : 278 - 285