On using MapReduce to scale algorithms for Big Data analytics: a case study

被引:4
|
作者
Kijsanayothin, Phongphun [1 ]
Chalumporn, Gantaphon [2 ]
Hewett, Rattikorn [2 ]
机构
[1] Naresuan Univ, Dept Elect & Comp Engn, NU, Phitsanulok, Thailand
[2] Texas Tech Univ, Dept Comp Sci, TTU, Lubbock, TX 79409 USA
关键词
Big Data analytics algorithms; Association rules mining; MapReduce; Parallel computing; A-PRIORI ALGORITHM; PARALLEL; MODEL;
D O I
10.1186/s40537-019-0269-1
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
IntroductionMany data analytics algorithms are originally designed for in-memory data. Parallel and distributed computing is a natural first remedy to scale these algorithms to "Big algorithms" for large-scale data. Advances in many Big Data analytics algorithms are contributed by MapReduce, a programming paradigm that enables parallel and distributed execution of massive data processing on large clusters of machines. Much research has focused on building efficient naive MapReduce-based algorithms or extending MapReduce mechanisms to enhance performance. However, we argue that these should not be the only research directions to pursue. We conjecture that when naive MapReduce-based solutions do not perform well, it could be because certain classes of algorithms are not amendable to MapReduce model and one should find a fundamentally different approach to a new MapReduce-based solution.Case descriptionThis paper investigates a case study of a scaling problem of "Big algorithms" for a popular association rule-mining algorithm, particularly the development of Apriori algorithm in MapReduce model.Discussion and evaluationFormal and empirical illustrations are explored to compare our proposed MapReduce-based Apriori algorithm with previous solutions. The findings support our conjecture and our study shows promising results compared to the state-of-the-art performer with 7% increase in performance on the average of transactions ranging from 10,000 to 120,000.ConclusionsThe results confirm that effective MapReduce implementation should avoid dependent iterations, such as that of the original sequential Apriori algorithm. These findings could lead to many more alternative non-naive MapReduce-based "Big algorithms".
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Customer profitability forecasting using Big Data analytics: A case study of the insurance industry
    Fang, Kuangnan
    Jiang, Yefei
    Song, Malin
    COMPUTERS & INDUSTRIAL ENGINEERING, 2016, 101 : 554 - 564
  • [32] Using big data analytics to study brand authenticity sentiments: The case of Starbucks on Twitter
    Shirdastian, Hamid
    Laroche, Michel
    Richard, Marie-Odile
    INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2019, 48 : 291 - 307
  • [33] Actualizing big data analytics affordances: A revelatory case study
    Dremel, Christian
    Herterich, Matthias M.
    Wulf, Jochen
    vom Brocke, Jan
    INFORMATION & MANAGEMENT, 2020, 57 (01)
  • [34] Protagonist of Big Data and Predictive Analytics using data analytics
    Subbalakshmi, Sakineti
    Prabhu, C. S. R.
    PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON COMPUTATIONAL TECHNIQUES, ELECTRONICS AND MECHANICAL SYSTEMS (CTEMS), 2018, : 276 - 279
  • [35] Big Data Analytics on High Velocity Streams: A Case Study
    Chardonnens, Thibaud
    Cudre-Mauroux, Philippe
    Grund, Martin
    Perroud, Benoit
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [36] Big Data Analytics in Healthcare: Case Study - Miscarriage Prediction
    Asri, Hiba
    Mousannif, Hajar
    Al Moatassime, Hassan
    INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES, 2019, 10 (04) : 45 - 58
  • [37] An intelligent approach to Big Data analytics for sustainable retail environment using Apriori-MapReduce framework
    Verma, Neha
    Singh, Jatinder
    INDUSTRIAL MANAGEMENT & DATA SYSTEMS, 2017, 117 (07) : 1503 - 1520
  • [38] Online learning algorithms for big data analytics: A survey
    Li, Zhijie
    Li, Yuanxiang
    Wang, Feng
    He, Guoliang
    Kuang, Li
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (08): : 1707 - 1721
  • [39] Different Clustering Algorithms for Big Data Analytics: A Review
    Dave, Meenu
    Gianey, Hemant
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART-2016), 2016, : 328 - 333
  • [40] A SURVEY OF MACHINE LEARNING ALGORITHMS FOR BIG DATA ANALYTICS
    Athmaja, S.
    Hanumanthappa, M.
    Kavitha, Vasantha
    2017 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2017,