A Comparison of Approaches to Large-Scale Data Analysis

被引:0
|
作者
Pavlo, Andrew [1 ]
Paulson, Erik
Rasin, Alexander [1 ]
Abadi, Daniel J.
DeWitt, David J.
Madden, Samuel
Stonebraker, Michael
机构
[1] Brown Univ, Providence, RI 02912 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There is currently considerable enthusiasm around the Map Reduce (MR) paradigm for large-scale data analysis [17]. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model [8, 17]. In this paper, we describe and compare both paradigms. Furthermore, we evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a benchmark consisting of a collection of tasks that we have run on an open source version of MR as well as on two parallel DBMSs. For each task, we measure each system's performance for various degrees of parallelism on a cluster of 100 nodes. Our results reveal some interesting trade-offs. Although the process to load data into and tune the execution of parallel DBMSs took much longer than the MR system, the observed performance of these DBMSs was strikingly better. We speculate about the causes of the dramatic performance difference and consider implementation concepts that future systems should take from both kinds of architectures.
引用
收藏
页码:165 / 178
页数:14
相关论文
共 50 条
  • [1] Efficient bioinformatics approaches for large-scale data analysis
    Hautaniemi, S.
    FEBS JOURNAL, 2011, 278 : 27 - 27
  • [2] A Comparison of Systems to Large-Scale Data Access
    Mesmoudi, Amin
    Hacid, Mohand-Said
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, 2014, 8505 : 161 - 175
  • [3] Large-Scale Web Data Analysis
    Leskovec, Jure
    IEEE INTELLIGENT SYSTEMS, 2011, 26 (01) : 11 - 11
  • [4] Large-Scale Visual Data Analysis
    Johnson, Chris
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 1 - 1
  • [5] A comparison of three modelling approaches for large-scale forest scenario analysis in Finland.
    Nuutinen, T
    Kellomäki, S
    SILVA FENNICA, 2001, 35 (03) : 299 - 308
  • [6] Large-Scale Comparison of Bioaugmentation and Biostimulation Approaches for Biocementation of Sands
    Gomez, Michael G.
    Anderson, Collin M.
    Graddy, Charles M. R.
    DeJong, Jason T.
    Nelson, Douglas C.
    Ginn, Timothy R.
    JOURNAL OF GEOTECHNICAL AND GEOENVIRONMENTAL ENGINEERING, 2017, 143 (05)
  • [7] Parallel and hierarchical decomposition approaches for solving large-scale Data Envelopment Analysis models
    Barr, RS
    Durchholz, ML
    ANNALS OF OPERATIONS RESEARCH, 1997, 73 (0) : 339 - 372
  • [8] Parallel and hierarchical decomposition approaches for solving large-scale Data Envelopment Analysis models
    Richard S. Barr
    Matthew L. Durchholz
    Annals of Operations Research, 1997, 73 : 339 - 372
  • [9] Large-scale approaches for glycobiology
    Campbell, CT
    Yarema, KJ
    GENOME BIOLOGY, 2005, 6 (11)
  • [10] Large-scale approaches for glycobiology
    Christopher T Campbell
    Kevin J Yarema
    Genome Biology, 6