Big Data: from collection to visualization

被引:0
|
作者
Mohammed Ghesmoune
Hanene Azzag
Salima Benbernou
Mustapha Lebbah
Tarn Duong
Mourad Ouziri
机构
[1] University of Paris 13,LIPN
[2] Sorbonne Paris City,UMR 7030
[3] University of Paris Descartes, CNRS
[4] Sorbonne Paris City,LIPADE
来源
Machine Learning | 2017年 / 106卷
关键词
Data fusion; RDF; Semantic; Entity resolution; Big data; Map-Reduce; Spark; Data stream clustering; Micro-Batch streaming; GNG; Topological structure; Visualization;
D O I
暂无
中图分类号
学科分类号
摘要
Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a ‘story’ of Big Data from the initial data collection and to the end visualization, passing by the data fusion, and the analysis and clustering tasks. For this, we present a complete work flow on (a) how to represent the heterogeneous collected data using the high performance RDF language, how to perform the fusion of the Big Data in RDF by resolving the issue of entity disambiguity and how to query those data to provide more relevant and complete knowledge and (b) as the data are received in data streams, we propose batchStream, a Micro-Batching version of the growing neural gas approach, which is capable of clustering data streams with a single pass over the data. The batchStream algorithm allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. This Big Data work flow is implemented in the Spark platform and we demonstrate it on synthetic and real data.
引用
收藏
页码:837 / 862
页数:25
相关论文
共 50 条
  • [1] Big Data: from collection to visualization
    Ghesmoune, Mohammed
    Azzag, Hanene
    Benbernou, Salima
    Lebbah, Mustapha
    Duong, Tarn
    Ouziri, Mourad
    MACHINE LEARNING, 2017, 106 (06) : 837 - 862
  • [2] Visualization of (multimedia) dependencies from big data
    Caruccio, Loredana
    Deufemia, Vincenzo
    Polese, Giuseppe
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (23) : 33151 - 33167
  • [3] Visualization of (multimedia) dependencies from big data
    Loredana Caruccio
    Vincenzo Deufemia
    Giuseppe Polese
    Multimedia Tools and Applications, 2019, 78 : 33151 - 33167
  • [4] Visualization of Big Data
    Kung, Sun-Yuan
    PROCEEDINGS OF 2015 IEEE 14TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2015, : 447 - 448
  • [5] The Big Picture for Big Data: Visualization
    Shneiderman, Ben
    SCIENCE, 2014, 343 (6172) : 730 - 730
  • [6] Big Data, Big Picture - Data Visualization of Health
    Bourke, Alison
    Ryan, Patrick B.
    Elhadad, Noemie
    Perer, Adam
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2016, 25 : 48 - 48
  • [7] Interactive Visualization of Big Data
    Godfrey, Parke
    Gryz, Jarek
    Lasek, Piotr
    Razavi, Nasim
    BEYOND DATABASES, ARCHITECTURES AND STRUCTURES, BDAS 2016, 2016, 613 : 3 - 22
  • [8] Big-Data Visualization
    Keim, Daniel
    Qu, Huamin
    Ma, Kwan-Liu
    IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2013, 33 (04) : 20 - 21
  • [9] BIG DATA IMPLEMENTATION AND VISUALIZATION
    Gupta, Deepa
    Siddiqui, Sameera
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ENGINEERING AND TECHNOLOGY RESEARCH (ICAETR), 2014,
  • [10] Efficacy of Bluetooth-Based Data Collection for Road Traffic Analysis and Visualization Using Big Data Analytics
    Kulkarni, Ashish Rajeshwar
    Kumar, Narendra
    Rao, K. Ramachandra
    BIG DATA MINING AND ANALYTICS, 2023, 6 (02) : 139 - 153