Big Data: from collection to visualization

被引:0
|
作者
Mohammed Ghesmoune
Hanene Azzag
Salima Benbernou
Mustapha Lebbah
Tarn Duong
Mourad Ouziri
机构
[1] University of Paris 13,LIPN
[2] Sorbonne Paris City,UMR 7030
[3] University of Paris Descartes, CNRS
[4] Sorbonne Paris City,LIPADE
来源
Machine Learning | 2017年 / 106卷
关键词
Data fusion; RDF; Semantic; Entity resolution; Big data; Map-Reduce; Spark; Data stream clustering; Micro-Batch streaming; GNG; Topological structure; Visualization;
D O I
暂无
中图分类号
学科分类号
摘要
Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a ‘story’ of Big Data from the initial data collection and to the end visualization, passing by the data fusion, and the analysis and clustering tasks. For this, we present a complete work flow on (a) how to represent the heterogeneous collected data using the high performance RDF language, how to perform the fusion of the Big Data in RDF by resolving the issue of entity disambiguity and how to query those data to provide more relevant and complete knowledge and (b) as the data are received in data streams, we propose batchStream, a Micro-Batching version of the growing neural gas approach, which is capable of clustering data streams with a single pass over the data. The batchStream algorithm allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. This Big Data work flow is implemented in the Spark platform and we demonstrate it on synthetic and real data.
引用
收藏
页码:837 / 862
页数:25
相关论文
共 50 条
  • [21] Interactive big data visualization and analytics
    Auber, David
    Bikakis, Nikos
    Chrysanthis, Panos K.
    Papastefanatosd, George
    Sharaf, Mohamed
    BIG DATA RESEARCH, 2024, 36
  • [22] Big Data Visualization: Tools and Challenges
    Ali, Syed Mohd
    Gupta, Noopur
    Nayak, Gopal Krishna
    Lenka, Rakesh Kumar
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2016, : 656 - 660
  • [23] Collection, Analysis and Interactive Visualization of NetFlow Data: Experience with Big Data on the Base of the National Research Computer Network of Russia
    A. G. Abramov
    Lobachevskii Journal of Mathematics, 2020, 41 : 2525 - 2534
  • [24] A BIG DATA PROCESSING METHODS FOR VISUALIZATION
    Fu, Qunchao
    Liu, Wanheng
    Xue, Tengfei
    Gu, Heng
    Zhang, Siyue
    Wang, Cong
    2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, : 571 - 575
  • [25] Big Data Provenance Analysis and Visualization
    Chen, Peng
    Plale, Beth
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 797 - 800
  • [26] Interactive Visualization for Big Spatial Data
    Ghosh, Saheli
    SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 1826 - 1828
  • [27] Collection, Analysis and Interactive Visualization of NetFlow Data: Experience with Big Data on the Base of the National Research Computer Network of Russia
    Abramov, A. G.
    LOBACHEVSKII JOURNAL OF MATHEMATICS, 2020, 41 (12) : 2525 - 2534
  • [28] Different Visualization Issues with Big Data
    Mondal, Koushik
    PROCEEDINGS OF FIRST INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS: VOL 2, 2016, 51 : 555 - 562
  • [29] Efficient Collection of Big data in WSN
    Halde, Sarita V.
    Khot, S. T.
    2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 1, 2016, : 423 - 427
  • [30] PCA Algorithms in the Visualization of Big Data from Polish Digital Libraries
    Osinski, Grzegorz
    Osinska, Veslava
    Malak, Piotr
    MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, 2019, 833 : 522 - 532