On using MapReduce to scale algorithms for Big Data analytics: a case study

被引:4
|
作者
Kijsanayothin, Phongphun [1 ]
Chalumporn, Gantaphon [2 ]
Hewett, Rattikorn [2 ]
机构
[1] Naresuan Univ, Dept Elect & Comp Engn, NU, Phitsanulok, Thailand
[2] Texas Tech Univ, Dept Comp Sci, TTU, Lubbock, TX 79409 USA
关键词
Big Data analytics algorithms; Association rules mining; MapReduce; Parallel computing; A-PRIORI ALGORITHM; PARALLEL; MODEL;
D O I
10.1186/s40537-019-0269-1
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
IntroductionMany data analytics algorithms are originally designed for in-memory data. Parallel and distributed computing is a natural first remedy to scale these algorithms to "Big algorithms" for large-scale data. Advances in many Big Data analytics algorithms are contributed by MapReduce, a programming paradigm that enables parallel and distributed execution of massive data processing on large clusters of machines. Much research has focused on building efficient naive MapReduce-based algorithms or extending MapReduce mechanisms to enhance performance. However, we argue that these should not be the only research directions to pursue. We conjecture that when naive MapReduce-based solutions do not perform well, it could be because certain classes of algorithms are not amendable to MapReduce model and one should find a fundamentally different approach to a new MapReduce-based solution.Case descriptionThis paper investigates a case study of a scaling problem of "Big algorithms" for a popular association rule-mining algorithm, particularly the development of Apriori algorithm in MapReduce model.Discussion and evaluationFormal and empirical illustrations are explored to compare our proposed MapReduce-based Apriori algorithm with previous solutions. The findings support our conjecture and our study shows promising results compared to the state-of-the-art performer with 7% increase in performance on the average of transactions ranging from 10,000 to 120,000.ConclusionsThe results confirm that effective MapReduce implementation should avoid dependent iterations, such as that of the original sequential Apriori algorithm. These findings could lead to many more alternative non-naive MapReduce-based "Big algorithms".
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Impact of big data analytics on banking: a case study
    He, Wu
    Hung, Jui-Long
    Liu, Lixin
    JOURNAL OF ENTERPRISE INFORMATION MANAGEMENT, 2023, 36 (02) : 459 - 479
  • [22] Big Data Analytics and IoT in logistics: a case study
    Hopkins, John
    Hawking, Paul
    INTERNATIONAL JOURNAL OF LOGISTICS MANAGEMENT, 2018, 29 (02) : 575 - 591
  • [23] Big Data analytics and facilities management: a case study
    Yang, Eunhwa
    Bayapu, Ipsitha
    FACILITIES, 2020, 38 (3/4) : 268 - 281
  • [24] Biological Big Data Analytics: Challenges and Algorithms
    Rajasekaran, Sanguthevar
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 1 - 1
  • [25] The Power of Big Data and Data Analytics for AMI Data: A Case Study
    Sidney Guerrero-Prado, Jenniffer
    Alfonso-Morales, Wilfredo
    Caicedo-Bravo, Eduardo
    Zayas-Perez, Benjamin
    Espinosa-Reza, Alfredo
    SENSORS, 2020, 20 (11) : 1 - 27
  • [26] A Theoretical Model for Big Data Analytics using Machine Learning Algorithms
    Sheshasaayee, Ananthi
    Lakshmi, J. V. N.
    PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 635 - 639
  • [27] A Hadoop/MapReduce based platform for supporting health big data analytics
    Kuo A.
    Chrimes D.
    Qin P.
    Zamani H.
    Studies in Health Technology and Informatics, 2019, 257 : 229 - 235
  • [28] AMPO: Algorithm for MapReduce Performance Optimization for Enhancing Big Data Analytics
    Yambem, Nandita
    Nandakumar, A. N.
    2017 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER, AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2017, : 717 - 723
  • [29] An Enhanced Memetic Algorithm for Feature Selection in Big Data Analytics with MapReduce
    Ramakrishnan, Umanesan
    Nachimuthu, Nandhagopal
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 31 (03): : 1547 - 1559
  • [30] A MapReduce Cortical Algorithms Implementation for Unsupervised Learning of Big Data
    Hajj, Nadine
    Rizk, Yara
    Awad, Mariette
    INNS CONFERENCE ON BIG DATA 2015 PROGRAM, 2015, 53 : 327 - 334