A Comparison of Approaches to Large-Scale Data Analysis

被引:0
|
作者
Pavlo, Andrew [1 ]
Paulson, Erik
Rasin, Alexander [1 ]
Abadi, Daniel J.
DeWitt, David J.
Madden, Samuel
Stonebraker, Michael
机构
[1] Brown Univ, Providence, RI 02912 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There is currently considerable enthusiasm around the Map Reduce (MR) paradigm for large-scale data analysis [17]. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model [8, 17]. In this paper, we describe and compare both paradigms. Furthermore, we evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a benchmark consisting of a collection of tasks that we have run on an open source version of MR as well as on two parallel DBMSs. For each task, we measure each system's performance for various degrees of parallelism on a cluster of 100 nodes. Our results reveal some interesting trade-offs. Although the process to load data into and tune the execution of parallel DBMSs took much longer than the MR system, the observed performance of these DBMSs was strikingly better. We speculate about the causes of the dramatic performance difference and consider implementation concepts that future systems should take from both kinds of architectures.
引用
收藏
页码:165 / 178
页数:14
相关论文
共 50 条
  • [21] Sports Video Analysis on Large-Scale Data
    Wu, Dekun
    Zhao, He
    Bao, Xingce
    Wildes, Richard P.
    COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 19 - 36
  • [22] Analysis of Approaches to Group Authentication in Large-Scale Industrial Systems
    E. B. Aleksandrova
    A. V. Yarmak
    M. O. Kalinin
    Automatic Control and Computer Sciences, 2019, 53 : 879 - 882
  • [23] Computational Approaches to Large-scale/Complex Nuclear Reactor Analysis
    Lee, Hyunsuk
    Khassenov, Azamat
    Zhang, Peng
    Lee, Deokjung
    2016 INTERNATIONAL CONFERENCE ON POWER, ENERGY ENGINEERING AND MANAGEMENT (PEEM 2016), 2016, : 332 - 336
  • [24] Analysis of Approaches to Group Authentication in Large-Scale Industrial Systems
    Aleksandrova, E. B.
    Yarmak, A. V.
    Kalinin, M. O.
    AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2019, 53 (08) : 879 - 882
  • [25] Why weight? Analytic approaches for large-scale population neuroscience data
    Gard, Arianna M.
    Hyde, Luke W.
    Heeringa, Steven G.
    West, Brady T.
    Mitchell, Colter
    DEVELOPMENTAL COGNITIVE NEUROSCIENCE, 2023, 59
  • [26] Large-scale docking approaches to the kinome
    Denis Schmidt
    Peter Kolb
    Journal of Cheminformatics, 6 (Suppl 1)
  • [27] Approaches to large-scale urban modeling
    Hu, JH
    You, SY
    Neumann, U
    IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2003, 23 (06) : 62 - 69
  • [28] COMPARISON OF LARGE-SCALE BOILER DATA WITH COMBUSTION MODEL PREDICTIONS
    BOYD, RK
    KENT, JH
    ENERGY & FUELS, 1994, 8 (01) : 124 - 130
  • [29] CytoGPS: A large-scale karyotype analysis of CML data
    Abrams, Zachary B.
    Li, Suli
    Zhang, Lin
    Coombes, Caitlin E.
    Payne, Philip R. O.
    Heerema, Nyla A.
    Abruzzo, Lynne, V
    Coombes, Kevin R.
    CANCER GENETICS, 2020, 248 : 34 - 38
  • [30] Deep learning for the large-scale cancer data analysis
    Tsuji, Shingo
    Aburatani, Hiroyuki
    CANCER RESEARCH, 2015, 75 (22)