Efficient Big Data Processing in Hadoop MapReduce

被引:124
|
作者
Dittrich, Jens [1 ,2 ]
Quiane-Ruiz, Jorge-Arnulfo [1 ]
机构
[1] Saarland Univ, Informat Syst Grp, Saarbrucken, Germany
[2] Saarland Univ, Comp Sci Databases, Saarbrucken, Germany
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2012年 / 5卷 / 12期
关键词
26;
D O I
10.14778/2367502.2367562
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This tutorial is motivated by the clear need of many organizations, companies, and researchers to deal with big data volumes efficiently. Examples include web analytics applications, scientific applications, and social networks. A popular data processing engine for big data is Hadoop MapReduce. Early versions of Hadoop MapReduce suffered from severe performance problems. Today, this is becoming history. There are many techniques that can be used with Hadoop MapReduce jobs to boost performance by orders of magnitude. In this tutorial we teach such techniques. First, we will briefly familiarize the audience with Hadoop MapReduce and motivate its use for big data processing. Then, we will focus on different data management techniques, going from job optimization to physical data organization like data layouts and indexes. Throughout this tutorial, we will highlight the similarities and differences between Hadoop MapReduce and Parallel DBMS. Furthermore, we will point out unresolved research problems and open issues.
引用
收藏
页码:2014 / 2015
页数:2
相关论文
共 50 条
  • [31] A Review on Data locality in Hadoop MapReduce
    Sharma, Anil
    Singh, Gurwinder
    2018 FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (IEEE PDGC), 2018, : 723 - 728
  • [32] Distributed Pattern Matching and Document Analysis in Big Data using Hadoop MapReduce Model
    Ramya, A., V
    Sivasankar, E.
    2014 INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2014, : 312 - 317
  • [33] A Demonstration of ST-Hadoop: A MapReduce Framework for Big Spatio-temporal Data
    Alarabi, Louai
    Mokbel, Mohamed F.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (12): : 1961 - 1964
  • [34] Big Data Analytics:Predicting Academic Course Preference Using Hadoop Inspired MapReduce
    Guleria, Pratiyush
    Sood, Manu
    2017 FOURTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP), 2017, : 328 - 331
  • [35] Processing of Medical Different Types of Data Using Hadoop and Java']Java MapReduce
    Boyko, Nataliya
    Tkachuk, Nazar
    IDDM 2020: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON INFORMATICS & DATA-DRIVEN MEDICINE, 2020, 2753
  • [36] Big Data Processing Using Hadoop and Spark: The Case of Meteorology Data
    Hussein, Eslam
    Sadiki, Ronewa
    Jafta, Yahlieel
    Sungay, Muhammad Mujahid
    Ajayi, Olasupo
    Bagula, Antoine
    E-INFRASTRUCTURE AND E-SERVICES FOR DEVELOPING COUNTRIES (AFRICOMM 2019), 2020, 311 : 180 - 185
  • [37] The Performance Optimization of Big Data Processing by Adaptive MapReduce Workflow
    Li, Wei
    Tang, Maolin
    IEEE ACCESS, 2022, 10 : 79004 - 79020
  • [39] Big Data Processing with Probabilistic Latent Semantic Analysis on MapReduce
    Zhao, Yong
    Chen, Yao
    Liang, Zhao
    Yuan, Shuangshuang
    Li, Youfu
    2014 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC), 2014, : 162 - 166
  • [40] Verifying Properties of MapReduce-Based Big Data Processing
    Zhang, Nan
    Wang, Meng
    Duan, Zhenhua
    Tian, Cong
    IEEE TRANSACTIONS ON RELIABILITY, 2022, 71 (01) : 321 - 338