Efficient Big Data Processing in Hadoop MapReduce

被引:124
|
作者
Dittrich, Jens [1 ,2 ]
Quiane-Ruiz, Jorge-Arnulfo [1 ]
机构
[1] Saarland Univ, Informat Syst Grp, Saarbrucken, Germany
[2] Saarland Univ, Comp Sci Databases, Saarbrucken, Germany
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2012年 / 5卷 / 12期
关键词
26;
D O I
10.14778/2367502.2367562
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This tutorial is motivated by the clear need of many organizations, companies, and researchers to deal with big data volumes efficiently. Examples include web analytics applications, scientific applications, and social networks. A popular data processing engine for big data is Hadoop MapReduce. Early versions of Hadoop MapReduce suffered from severe performance problems. Today, this is becoming history. There are many techniques that can be used with Hadoop MapReduce jobs to boost performance by orders of magnitude. In this tutorial we teach such techniques. First, we will briefly familiarize the audience with Hadoop MapReduce and motivate its use for big data processing. Then, we will focus on different data management techniques, going from job optimization to physical data organization like data layouts and indexes. Throughout this tutorial, we will highlight the similarities and differences between Hadoop MapReduce and Parallel DBMS. Furthermore, we will point out unresolved research problems and open issues.
引用
收藏
页码:2014 / 2015
页数:2
相关论文
共 50 条
  • [41] Processing of Big Educational Data in the Cloud Using Apache Hadoop
    Machova, Renata
    Komarkova, Jitka
    Lnenicka, Martin
    INTERNATIONAL CONFERENCE ON INFORMATION SOCIETY (I-SOCIETY 2016), 2016, : 46 - 49
  • [42] Towards Efficient Big Data Storage With MapReduce Deduplication System
    Joe, Vijesh
    Raj, Jennifer S.
    Smys, S.
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING, 2021, 16 (02) : 45 - 57
  • [43] A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop
    Pal, Amrit
    Agrawal, Pinki
    Jain, Kunal
    Agrawal, Sanjay
    2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 587 - 591
  • [44] Clustering of Association Rules for Big Datasets using Hadoop MapReduce
    Moahmmed, Salahadin A.
    Alasow, Mohamed A.
    El-Alfy, El-Sayed M.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (03) : 536 - 545
  • [45] Data Analysis using Hadoop MapReduce Environment
    Merla, PrathyushaRani
    Liang, Yiheng
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4783 - 4785
  • [46] Data cube computational model with hadoop mapreduce
    Wang, Bo
    Gui, Hao
    Roantree, Mark
    O'Connor, Martin F.
    WEBIST 2014 - Proceedings of the 10th International Conference on Web Information Systems and Technologies, 2014, 1 : 193 - 199
  • [47] Trust-Based Scheduling Framework for Big Data Processing with MapReduce
    Thanh Dat Dang
    Doan Hoang
    Nguyen, Diep N.
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (01) : 279 - 293
  • [48] Multi-objective scheduling of MapReduce jobs in big data processing
    Hashem, Ibrahim Abaker Targio
    Anuar, Nor Badrul
    Marjani, Mohsen
    Gani, Abdullah
    Sangaiah, Arun Kumar
    Sakariyah, Adewole Kayode
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (08) : 9979 - 9994
  • [49] Processing Geo-Dispersed Big Data in an Advanced MapReduce Framework
    Zhang, Hongli
    Zhang, Qiang
    Zhou, Zhigang
    Du, Xiaojiang
    Yu, Wei
    Guizani, Mohsen
    IEEE NETWORK, 2015, 29 (05): : 24 - 30
  • [50] Multi-objective scheduling of MapReduce jobs in big data processing
    Ibrahim Abaker Targio Hashem
    Nor Badrul Anuar
    Mohsen Marjani
    Abdullah Gani
    Arun Kumar Sangaiah
    Adewole Kayode Sakariyah
    Multimedia Tools and Applications, 2018, 77 : 9979 - 9994