Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework

被引:26
|
作者
Zhao, Yaxiong [1 ,2 ]
Wu, Jie [2 ]
Liu, Cong [3 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
[2] Temple Univ, Philadelphia, PA 19122 USA
[3] Sun Yat Sen Univ, Guangzhou 510275, Guangdong, Peoples R China
关键词
big-data; MapReduce; Hadoop; caching;
D O I
10.1109/TST.2014.6733207
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The buzz-word big-data refers to the large-scale distributed data processing applications that operate on exceptionally large amounts of data. Google's MapReduce and Apache's Hadoop, its open-source implementation, are the defacto software systems for big-data applications. An observation of the MapReduce framework is that the framework generates a large amount of intermediate data. Such abundant information is thrown away after the tasks finish, because MapReduce is unable to utilize them. In this paper, we propose Dache, a data-aware cache framework for big-data applications. In Dache, tasks submit their intermediate results to the cache manager. A task queries the cache manager before executing the actual computing work. A novel cache description scheme and a cache request and reply protocol are designed. We implement Dache by extending Hadoop. Testbed experiment results demonstrate that Dache significantly improves the completion time of MapReduce jobs.
引用
收藏
页码:39 / 50
页数:12
相关论文
共 50 条
  • [11] On the Timed Analysis of Big-Data Applications
    Marconi, Francesco
    Quattrocchi, Giovanni
    Baresi, Luciano
    Bersani, Marcello M.
    Rossi, Matteo
    NASA FORMAL METHODS, NFM 2018, 2018, 10811 : 315 - 332
  • [12] A FAST BIG DATA COLLECTION SYSTEM USING MAPREDUCE FRAMEWORK
    Bing, Li
    Chan, Keith C. C.
    2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, : 530 - 535
  • [13] Feature Selection and Classification of Big Data Using MapReduce Framework
    Devi, D. Renuka
    Sasikala, S.
    INTELLIGENT COMPUTING, INFORMATION AND CONTROL SYSTEMS, ICICCS 2019, 2020, 1039 : 666 - 673
  • [14] Preemption-aware planning on Big-Data Systems
    Rabozzi, Marco
    Mazzucchelli, Matteo
    Cordone, Roberto
    Fumarola, Giovanni Matteo
    Santambrogio, Marco D.
    ACM SIGPLAN NOTICES, 2016, 51 (08) : 399 - 400
  • [15] Big-Data in Climate Change Models - A novel approach with Hadoop MapReduce
    Loaiza, Juan Manuel Carmona
    Giuliani, Graziano
    Fiameni, Giuseppe
    2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 45 - 50
  • [16] Profiling Memory Vulnerability of Big-data Applications
    Rameshan, N.
    Birke, R.
    Navarro, L.
    Vlassov, V.
    Urgaonkar, B.
    Kesidis, G.
    Schmatz, M.
    Chen, L. Y.
    2016 46TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOPS (DSN-W), 2016, : 258 - 261
  • [17] Adaptive Cache Deploying Architecture Using Big-Data Framework for CDN
    Ku, Tai-Yeon
    Won, Hee-Sun
    Choi, Hoon
    2015 INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC), 2015, : 1232 - 1236
  • [18] KASR: A Keyword-Aware Service Recommendation Method on MapReduce for Big Data Applications
    Meng, Shunmei
    Dou, Wanchun
    Zhang, Xuyun
    Chen, Jinjun
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (12) : 3221 - 3231
  • [19] A Big Data Prediction Framework for Weather Forecast Using MapReduce Algorithm
    Adam, Khalid
    Majid, Mazlina Abdul
    Fakherldin, Mohammed Adam Ibrahim
    Zain, Jasni Mohamed
    ADVANCED SCIENCE LETTERS, 2017, 23 (11) : 11138 - 11143
  • [20] A Paralleled Big Data Algorithm with MapReduce Framework for Mining Twitter Data
    Li Bing
    Chan, Keith C. C.
    2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 121 - 128