A MapReduce-based scalable discovery and indexing of structured big data

被引:23
|
作者
Singh, Hari [1 ]
Bawa, Seema [1 ]
机构
[1] Thapar Univ, Comp Sci & Engn Dept, Patiala, Punjab, India
关键词
Hadoop; Distributed computing; MapReduce; HDFS; Cluster; B-Tree;
D O I
10.1016/j.future.2017.03.028
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Various methods and techniques have been proposed in past for improving performance of queries on structured and unstructured data. The paper proposes a parallel B-Tree index in the MapReduce framework for improving efficiency of random reads over the existing approaches. The benefit of using the MapReduce framework is that it encapsulates the complexity of implementing parallelism and fault tolerance from users and presents these in a user friendly way. The proposed index reduces the number of data accesses for range queries and thus improves efficiency. The B-Tree index on MapReduce is implemented in a chained-MapReduce process that reduces intermediate data access time between successive map and reduce functions, and improves efficiency. Finally, five performance metrics have been used to validate the performance of proposed index for range search query in MapReduce, such as, varying cluster size and, size of range search query coverage on execution time, the number of map tasks and size of Input/Output (I/O) data. The effect of varying Hadoop Distributed File System (HDFS) block size and, analysis of the size of heap memory and intermediate data generated during map and reduce functions also shows the superiority of the proposed index. It is observed through experimental results that the parallel B-Tree index along with a chained-MapReduce environment performs better than default non-indexed dataset of the Hadoop and B-Tree like Global Index (Zhao et al., 2012) in MapReduce. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:32 / 43
页数:12
相关论文
共 50 条
  • [1] MapReduce-based storage and indexing for big health data
    Gayathiri, N. R.
    Natarajan, A. M.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (14):
  • [2] A MapReduce-Based ELM for Regression in Big Data
    Wu, B.
    Yan, T. H.
    Xu, X. S.
    He, B.
    Li, W. H.
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 164 - 173
  • [3] Atrak: a MapReduce-based data warehouse for big data
    Barkhordari, Mohammadhossein
    Niamanesh, Mahdi
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (10): : 4596 - 4610
  • [4] Atrak: a MapReduce-based data warehouse for big data
    Mohammadhossein Barkhordari
    Mahdi Niamanesh
    The Journal of Supercomputing, 2017, 73 : 4596 - 4610
  • [5] A MapReduce-based Fuzzy Associative Classifier for Big Data
    Ducange, Pietro
    Marcelloni, Francesco
    Segatori, Armando
    2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
  • [6] Verifying Properties of MapReduce-Based Big Data Processing
    Zhang, Nan
    Wang, Meng
    Duan, Zhenhua
    Tian, Cong
    IEEE TRANSACTIONS ON RELIABILITY, 2022, 71 (01) : 321 - 338
  • [7] A MapReduce-Based Distributed SVM for Scalable Data Type Classification
    Jiang, Chong
    Wu, Ting
    Xu, Jian
    Zheng, Ning
    Xu, Ming
    Yang, Tao
    COLLABORATE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2016, 2017, 201 : 115 - 126
  • [8] An Accelerated MapReduce-Based K-prototypes for Big Data
    Ben HajKacem, Mohamed Aymen
    Ben N'cir, Chiheb-Eddine
    Essoussi, Nadia
    SOFTWARE TECHNOLOGIES: APPLICATIONS AND FOUNDATIONS (STAF 2016), 2016, 9946 : 13 - 25
  • [9] A MapReduce-based approach to social network big data mining
    Qi, Fuli
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2023, 23 (05) : 2535 - 2547
  • [10] A MapReduce-based Approach to Scale Big Semantic Data Compression with HDT
    Gimenez, J. M.
    Fernandez, J. D.
    Martinez, M. A.
    IEEE LATIN AMERICA TRANSACTIONS, 2017, 15 (07) : 1270 - 1277