Big Data: from collection to visualization

被引:0
|
作者
Mohammed Ghesmoune
Hanene Azzag
Salima Benbernou
Mustapha Lebbah
Tarn Duong
Mourad Ouziri
机构
[1] University of Paris 13,LIPN
[2] Sorbonne Paris City,UMR 7030
[3] University of Paris Descartes, CNRS
[4] Sorbonne Paris City,LIPADE
来源
Machine Learning | 2017年 / 106卷
关键词
Data fusion; RDF; Semantic; Entity resolution; Big data; Map-Reduce; Spark; Data stream clustering; Micro-Batch streaming; GNG; Topological structure; Visualization;
D O I
暂无
中图分类号
学科分类号
摘要
Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a ‘story’ of Big Data from the initial data collection and to the end visualization, passing by the data fusion, and the analysis and clustering tasks. For this, we present a complete work flow on (a) how to represent the heterogeneous collected data using the high performance RDF language, how to perform the fusion of the Big Data in RDF by resolving the issue of entity disambiguity and how to query those data to provide more relevant and complete knowledge and (b) as the data are received in data streams, we propose batchStream, a Micro-Batching version of the growing neural gas approach, which is capable of clustering data streams with a single pass over the data. The batchStream algorithm allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. This Big Data work flow is implemented in the Spark platform and we demonstrate it on synthetic and real data.
引用
收藏
页码:837 / 862
页数:25
相关论文
共 50 条
  • [31] Data Visualization and Statistical Literacy for Open and Big Data
    Shanmugam, Ramalingam
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2020,
  • [32] Fisheries data management systems in the NW Mediterranean: from data collection to web visualization
    Ribera-Altimir, Jordi
    Llorach-To, Gerard
    Sala-Coromina, Joan
    Company, Joan B.
    Galimany, Eve
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2023, 2023
  • [33] Data Visualization and Statistical Graphics in Big Data Analysis
    Cook, Dianne
    Lee, Eun-Kyung
    Majumder, Mahbubul
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 3, 2016, 3 : 133 - 159
  • [34] Event graph based contradiction recognition from big data collection
    Liu, Maofu
    Wang, Limin
    Nie, Liqiang
    Dai, Jianhua
    Ji, Donghong
    NEUROCOMPUTING, 2016, 181 : 64 - 75
  • [35] A Framework for the Efficient Collection of Big Data from Online Social Networks
    Petrillo, Umberto Ferraro
    Consolo, Stefano
    2014 INTERNATIONAL CONFERENCE ON INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS (INCOS), 2014, : 34 - 41
  • [36] Big Data Analytics and Visualization in Traffic Monitoring
    Bachechi, Chiara
    Po, Laura
    Rollo, Federica
    BIG DATA RESEARCH, 2022, 27
  • [37] A Method to Visualization Data Collection by Using Gamification
    Yampray, Karuna
    Inchamnan, Wilawan
    2019 17TH INTERNATIONAL CONFERENCE ON ICT AND KNOWLEDGE ENGINEERING (ICT&KE), 2019, : 143 - 146
  • [38] Big Data visualization: Review of techniques and datasets
    Velazquez Pena, Luis Eder
    Rodriguez Mazahua, Lisbeth
    Alor Hernandez, Giner
    Olivares Zepahua, Beatriz Alejandra
    Pelaez Camarena, S. Gustavo
    Machorro Cano, Isaac
    2017 6TH INTERNATIONAL CONFERENCE ON SOFTWARE PROCESS IMPROVEMENT (CIMPS), 2017,
  • [39] DEEPEYE: An Automatic Big Data Visualization Framework
    Xuedi Qin
    Yuyu Luo
    Nan Tang
    Guoliang Li
    Big Data Mining and Analytics, 2018, (01) : 75 - 82
  • [40] A System for Monitoring and Visualization of Big Mobility Data
    Meskovic, E.
    Osmanovic, D.
    2018 41ST INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2018, : 1086 - 1091