A MapReduce-based scalable discovery and indexing of structured big data

被引：23

作者：

Singh, Hari ^{[1
]}

Bawa, Seema ^{[1
]}

机构：

[1] Thapar Univ, Comp Sci & Engn Dept, Patiala, Punjab, India

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2017年 / 73卷

关键词：

Hadoop; Distributed computing; MapReduce; HDFS; Cluster; B-Tree;

D O I：

10.1016/j.future.2017.03.028

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Various methods and techniques have been proposed in past for improving performance of queries on structured and unstructured data. The paper proposes a parallel B-Tree index in the MapReduce framework for improving efficiency of random reads over the existing approaches. The benefit of using the MapReduce framework is that it encapsulates the complexity of implementing parallelism and fault tolerance from users and presents these in a user friendly way. The proposed index reduces the number of data accesses for range queries and thus improves efficiency. The B-Tree index on MapReduce is implemented in a chained-MapReduce process that reduces intermediate data access time between successive map and reduce functions, and improves efficiency. Finally, five performance metrics have been used to validate the performance of proposed index for range search query in MapReduce, such as, varying cluster size and, size of range search query coverage on execution time, the number of map tasks and size of Input/Output (I/O) data. The effect of varying Hadoop Distributed File System (HDFS) block size and, analysis of the size of heap memory and intermediate data generated during map and reduce functions also shows the superiority of the proposed index. It is observed through experimental results that the parallel B-Tree index along with a chained-MapReduce environment performs better than default non-indexed dataset of the Hadoop and B-Tree like Global Index (Zhao et al., 2012) in MapReduce. (C) 2017 Elsevier B.V. All rights reserved.

引用

页码：32 / 43

页数：12

共 50 条

[1] MapReduce-based storage and indexing for big health data
Gayathiri, N. R.
Natarajan, A. M.
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (14):
[2] A MapReduce-Based ELM for Regression in Big Data
Wu, B.
Yan, T. H.
Xu, X. S.
He, B.
Li, W. H.
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 164 - 173
[3] Atrak: a MapReduce-based data warehouse for big data
Barkhordari, Mohammadhossein
Niamanesh, Mahdi
JOURNAL OF SUPERCOMPUTING, 2017, 73 (10): : 4596 - 4610
[4] Atrak: a MapReduce-based data warehouse for big data
Mohammadhossein Barkhordari
Mahdi Niamanesh
The Journal of Supercomputing, 2017, 73 : 4596 - 4610
[5] A MapReduce-based Fuzzy Associative Classifier for Big Data
Ducange, Pietro
Marcelloni, Francesco
Segatori, Armando
2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
[6] Verifying Properties of MapReduce-Based Big Data Processing
Zhang, Nan
Wang, Meng
Duan, Zhenhua
Tian, Cong
IEEE TRANSACTIONS ON RELIABILITY, 2022, 71 (01) : 321 - 338
[7] A MapReduce-Based Distributed SVM for Scalable Data Type Classification
Jiang, Chong
Wu, Ting
Xu, Jian
Zheng, Ning
Xu, Ming
Yang, Tao
COLLABORATE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2016, 2017, 201 : 115 - 126
[8] An Accelerated MapReduce-Based K-prototypes for Big Data
Ben HajKacem, Mohamed Aymen
Ben N'cir, Chiheb-Eddine
Essoussi, Nadia
SOFTWARE TECHNOLOGIES: APPLICATIONS AND FOUNDATIONS (STAF 2016), 2016, 9946 : 13 - 25
[9] A MapReduce-based approach to social network big data mining
Qi, Fuli
JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2023, 23 (05) : 2535 - 2547
[10] A MapReduce-based Approach to Scale Big Semantic Data Compression with HDT
Gimenez, J. M.
Fernandez, J. D.
Martinez, M. A.
IEEE LATIN AMERICA TRANSACTIONS, 2017, 15 (07) : 1270 - 1277

← 1 2 3 4 5 →