Analysis of Massive Industrial Data using MapReduce Framework for Parallel Processing

被引:0
|
作者
Aly, Mohab [1 ]
Yacout, Soumaya [1 ]
Shaban, Yasser [2 ]
机构
[1] Ecole Polytech Montreal, Dept Ind Engn, CP 6079,Succ Ctr Ville, Montreal, PQ H3C 3A7, Canada
[2] Helwan Univ, Dept Mech Design Engn, POB 11718, Cairo, Egypt
来源
2017 ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM | 2017年
关键词
Cloud Computing; Big Data; MapReduce; Parallel Processing; Data mining;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the emergence of the 'Big Data' paradigm, more and more industrial data are now available for practitioners and professionals. This data is being generated faster due to the advancement of the new information technologies. For reliability and maintenance engineers, 'Big Data' is an interesting source of information. If analyzed correctly, it can produce useful knowledge-base to help making decisions in an industrial organization. The availability of 'Big Data' is now leading to a new area of researches that are dedicated to the analysis of such data. This paper shows how to analyze massive amount of data generated from an industrial system(s). Those massive data may range from terabytes to petabytes in size; analyzing such sizes cannot be performed on a single commodity computer due to the possibility of memory leakage as the data may not fit into the computer's resources, specifically CPUs. Even if it fits, it will take an unacceptable amount of time. For this purpose, processing industrial large size of data requires the involvement of high performance analytical systems running on distributed environments. Different algorithms can be considered to have such analysis done. Cloud Computing models provide the necessary scalable and flexible infrastructure(s) to adapt the standard analytics algorithms in a distributed manner. We introduce a new distributed training technique that combines the newly widely used framework for big dataflow, namely MapReduce, with the traditional structure of machine learning techniques such as matrix multiplication and linear regression. Parallel processing of the aforementioned types is based on different algorithms to be adapted to MapReduce and its framework. Our considered platform is deployed on top of Google Cloud Platform (App Engine and Compute Engine), also taking into consideration Cloud Amazon EMR services to see how we can benefit from the provisioned resources in each one of them, and make the analysis and the extraction of useful information from the massive industrial data goes faster, i.e. in its computational time.
引用
收藏
页数:6
相关论文
共 50 条
  • [11] A Parallel Processing Framework using MapReduce for Content-Based Image Retrieval
    Tungkasthan, Anucha
    Premchaiswadi, Wichian
    2013 ELEVENTH INTERNATIONAL CONFERENCE ON ICT AND KNOWLEDGE ENGINEERING (ICT&KE), 2013,
  • [12] Big Data Analysis Solutions using MapReduce Framework
    Elagib, Sara B.
    Najeeb, Atahur Rahman
    Hashim, Aisha H.
    Olanrewaju, Rashidah F.
    2014 INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING (ICCCE), 2014, : 127 - 130
  • [13] Scientific data processing framework for Hadoop MapReduce
    Department of Computer and Information, Xinxiang University, Xinxiang, China
    1600, Journal of Chemical and Pharmaceutical Research, 3/668 Malviya Nagar, Jaipur, Rajasthan, India (06):
  • [14] A Distributed Framework for Predictive Analytics Using Big Data and MapReduce Parallel Programming
    Natesan P.
    Sathishkumar V.E.
    Mathivanan S.K.
    Venkatasen M.
    Jayagopal P.
    Allayear S.M.
    Mathematical Problems in Engineering, 2023, 2023
  • [15] Parallel Processing of Big Data using Power Iteration Clustering over MapReduce
    Jayalatchumy, D.
    Thambidurai, P.
    Alamelu, A. Vasumathi
    2014 WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT 2014), 2014, : 176 - 178
  • [16] Parallel Massive Data Monitoring and Processing Using Sensor Networks
    Naji, Hamid Reza
    Rezaee, Najmeh
    IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS: CYBERSECURITY AND BIG DATA, 2016, : 225 - 230
  • [17] An improved partitioning mechanism for optimizing massive data analysis using MapReduce
    Slagter, Kenn
    Hsu, Ching-Hsien
    Chung, Yeh-Ching
    Zhang, Daqiang
    JOURNAL OF SUPERCOMPUTING, 2013, 66 (01): : 539 - 555
  • [18] An improved partitioning mechanism for optimizing massive data analysis using MapReduce
    Kenn Slagter
    Ching-Hsien Hsu
    Yeh-Ching Chung
    Daqiang Zhang
    The Journal of Supercomputing, 2013, 66 : 539 - 555
  • [19] A Parallel Framework for Processing Massive Spatial Data with a Split-and-Merge Paradigm
    Guan, Xuefeng
    Wu, Huayi
    Li, Lin
    TRANSACTIONS IN GIS, 2012, 16 (06) : 829 - 843
  • [20] A ROBUST PARALLEL FRAMEWORK FOR MASSIVE SPATIAL DATA PROCESSING ON HIGH PERFORMANCE CLUSTERS
    Guan, Xuefeng
    XXII ISPRS CONGRESS, TECHNICAL COMMISSION IV, 2012, 39-B4 : 213 - 217