Analysis of Massive Industrial Data using MapReduce Framework for Parallel Processing

被引:0
|
作者
Aly, Mohab [1 ]
Yacout, Soumaya [1 ]
Shaban, Yasser [2 ]
机构
[1] Ecole Polytech Montreal, Dept Ind Engn, CP 6079,Succ Ctr Ville, Montreal, PQ H3C 3A7, Canada
[2] Helwan Univ, Dept Mech Design Engn, POB 11718, Cairo, Egypt
关键词
Cloud Computing; Big Data; MapReduce; Parallel Processing; Data mining;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the emergence of the 'Big Data' paradigm, more and more industrial data are now available for practitioners and professionals. This data is being generated faster due to the advancement of the new information technologies. For reliability and maintenance engineers, 'Big Data' is an interesting source of information. If analyzed correctly, it can produce useful knowledge-base to help making decisions in an industrial organization. The availability of 'Big Data' is now leading to a new area of researches that are dedicated to the analysis of such data. This paper shows how to analyze massive amount of data generated from an industrial system(s). Those massive data may range from terabytes to petabytes in size; analyzing such sizes cannot be performed on a single commodity computer due to the possibility of memory leakage as the data may not fit into the computer's resources, specifically CPUs. Even if it fits, it will take an unacceptable amount of time. For this purpose, processing industrial large size of data requires the involvement of high performance analytical systems running on distributed environments. Different algorithms can be considered to have such analysis done. Cloud Computing models provide the necessary scalable and flexible infrastructure(s) to adapt the standard analytics algorithms in a distributed manner. We introduce a new distributed training technique that combines the newly widely used framework for big dataflow, namely MapReduce, with the traditional structure of machine learning techniques such as matrix multiplication and linear regression. Parallel processing of the aforementioned types is based on different algorithms to be adapted to MapReduce and its framework. Our considered platform is deployed on top of Google Cloud Platform (App Engine and Compute Engine), also taking into consideration Cloud Amazon EMR services to see how we can benefit from the provisioned resources in each one of them, and make the analysis and the extraction of useful information from the massive industrial data goes faster, i.e. in its computational time.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Privacy Preserving Parallel Clustering Based Anonymization for Big Data Using MapReduce Framework
    Lawrance, Josephine Usha
    Jesudhasan, Jesu Vedha Nayahi
    APPLIED ARTIFICIAL INTELLIGENCE, 2021, 35 (15) : 1587 - 1620
  • [22] Study on massive data processing of intermittent energy based on MapReduce Model
    Mei, Huawei
    Mi, Zengqiang
    Wu, Guanglei
    Dianli Xitong Zidonghua/Automation of Electric Power Systems, 2014, 38 (15): : 76 - 80
  • [23] Massive Image Data Management using HBase and MapReduce
    Liu, Yuehu
    Chen, Bin
    He, Wenxi
    Fang, Yu
    2013 21ST INTERNATIONAL CONFERENCE ON GEOINFORMATICS (GEOINFORMATICS), 2013,
  • [24] Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce
    Song, Ge
    Rochas, Justine
    Huet, Fabrice
    Magoules, Frederic
    23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 279 - 287
  • [25] Algorithm for Clustering Analysis of Gene Expression Data using MapReduce Framework
    Priya, P. Packia Amutha
    Lawrance, R.
    2016 INTERNATIONAL CONFERENCE ON COMPUTING TECHNOLOGIES AND INTELLIGENT DATA ENGINEERING (ICCTIDE'16), 2016,
  • [26] LShape Partitioning: Parallel Skyline Query Processing Using MapReduce
    Wijayanto, Heri
    Wang, Wenlu
    Ku, Wei-Shinn
    Chen, Arbee L. P.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (07) : 3363 - 3376
  • [27] An Ontology-driven MapReduce Framework for Association Rules Mining in Massive Data
    Gahar, Rania Mkhinini
    Arfaoui, Olfa
    Sassi Hidri, Minyar
    Ben Hadj-Alouane, Nejib
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES-2018), 2018, 126 : 224 - 233
  • [28] Combiner to Reduce the Time of Processing in Trend Analysis using Hadoop's MapReduce Framework
    Pinto, Vivek Francis
    2017 2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTION (CSITSS-2017), 2017, : 166 - 169
  • [29] Set similarity join on massive probabilistic data using MapReduce
    Ma, Youzhong
    Meng, Xiaofeng
    DISTRIBUTED AND PARALLEL DATABASES, 2014, 32 (03) : 447 - 464
  • [30] Set similarity join on massive probabilistic data using MapReduce
    Youzhong Ma
    Xiaofeng Meng
    Distributed and Parallel Databases, 2014, 32 : 447 - 464