Efficient Big Data Processing in Hadoop MapReduce

被引：124

作者：

Dittrich, Jens ^{[1
,2
]}

Quiane-Ruiz, Jorge-Arnulfo ^{[1
]}

机构：

[1] Saarland Univ, Informat Syst Grp, Saarbrucken, Germany

[2] Saarland Univ, Comp Sci Databases, Saarbrucken, Germany

来源：

PROCEEDINGS OF THE VLDB ENDOWMENT | 2012年 / 5卷 / 12期

关键词：

26;

D O I：

10.14778/2367502.2367562

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This tutorial is motivated by the clear need of many organizations, companies, and researchers to deal with big data volumes efficiently. Examples include web analytics applications, scientific applications, and social networks. A popular data processing engine for big data is Hadoop MapReduce. Early versions of Hadoop MapReduce suffered from severe performance problems. Today, this is becoming history. There are many techniques that can be used with Hadoop MapReduce jobs to boost performance by orders of magnitude. In this tutorial we teach such techniques. First, we will briefly familiarize the audience with Hadoop MapReduce and motivate its use for big data processing. Then, we will focus on different data management techniques, going from job optimization to physical data organization like data layouts and indexes. Throughout this tutorial, we will highlight the similarities and differences between Hadoop MapReduce and Parallel DBMS. Furthermore, we will point out unresolved research problems and open issues.

引用

页码：2014 / 2015

页数：2

共 50 条

[41] Processing of Big Educational Data in the Cloud Using Apache Hadoop
Machova, Renata
Komarkova, Jitka
Lnenicka, Martin
INTERNATIONAL CONFERENCE ON INFORMATION SOCIETY (I-SOCIETY 2016), 2016, : 46 - 49
[42] Towards Efficient Big Data Storage With MapReduce Deduplication System
Joe, Vijesh
Raj, Jennifer S.
Smys, S.
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING, 2021, 16 (02) : 45 - 57
[43] A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop
Pal, Amrit
Agrawal, Pinki
Jain, Kunal
Agrawal, Sanjay
2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 587 - 591
[44] Clustering of Association Rules for Big Datasets using Hadoop MapReduce
Moahmmed, Salahadin A.
Alasow, Mohamed A.
El-Alfy, El-Sayed M.
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (03) : 536 - 545
[45] Data Analysis using Hadoop MapReduce Environment
Merla, PrathyushaRani
Liang, Yiheng
2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4783 - 4785
[46] Data cube computational model with hadoop mapreduce
Wang, Bo
Gui, Hao
Roantree, Mark
O'Connor, Martin F.
WEBIST 2014 - Proceedings of the 10th International Conference on Web Information Systems and Technologies, 2014, 1 : 193 - 199
[47] Trust-Based Scheduling Framework for Big Data Processing with MapReduce
Thanh Dat Dang
Doan Hoang
Nguyen, Diep N.
IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (01) : 279 - 293
[48] Multi-objective scheduling of MapReduce jobs in big data processing
Hashem, Ibrahim Abaker Targio
Anuar, Nor Badrul
Marjani, Mohsen
Gani, Abdullah
Sangaiah, Arun Kumar
Sakariyah, Adewole Kayode
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (08) : 9979 - 9994
[49] Processing Geo-Dispersed Big Data in an Advanced MapReduce Framework
Zhang, Hongli
Zhang, Qiang
Zhou, Zhigang
Du, Xiaojiang
Yu, Wei
Guizani, Mohsen
IEEE NETWORK, 2015, 29 (05): : 24 - 30
[50] Multi-objective scheduling of MapReduce jobs in big data processing
Ibrahim Abaker Targio Hashem
Nor Badrul Anuar
Mohsen Marjani
Abdullah Gani
Arun Kumar Sangaiah
Adewole Kayode Sakariyah
Multimedia Tools and Applications, 2018, 77 : 9979 - 9994

← 1 2 3 4 5 →