Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework

被引:26
|
作者
Zhao, Yaxiong [1 ,2 ]
Wu, Jie [2 ]
Liu, Cong [3 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
[2] Temple Univ, Philadelphia, PA 19122 USA
[3] Sun Yat Sen Univ, Guangzhou 510275, Guangdong, Peoples R China
关键词
big-data; MapReduce; Hadoop; caching;
D O I
10.1109/TST.2014.6733207
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The buzz-word big-data refers to the large-scale distributed data processing applications that operate on exceptionally large amounts of data. Google's MapReduce and Apache's Hadoop, its open-source implementation, are the defacto software systems for big-data applications. An observation of the MapReduce framework is that the framework generates a large amount of intermediate data. Such abundant information is thrown away after the tasks finish, because MapReduce is unable to utilize them. In this paper, we propose Dache, a data-aware cache framework for big-data applications. In Dache, tasks submit their intermediate results to the cache manager. A task queries the cache manager before executing the actual computing work. A novel cache description scheme and a cache request and reply protocol are designed. We implement Dache by extending Hadoop. Testbed experiment results demonstrate that Dache significantly improves the completion time of MapReduce jobs.
引用
收藏
页码:39 / 50
页数:12
相关论文
共 50 条
  • [31] Improving Network Traffic in MapReduce for Big Data Applications
    Gawande, Priya
    Shaikh, Nuzhaft
    2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 2979 - 2983
  • [32] Big Data Quality Scoring for Structured Data Using MapReduce
    Wu, Yalong
    Dhamodharan, Shalini
    Ghattamaneni, Vinuthna
    Kokila, Narmada
    Pathakamuri, Chandrika
    Carter, Timothy
    Tian, Pu
    Sha, Kewei
    2024 33RD INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS, ICCCN 2024, 2024,
  • [33] The Scalability of Volunteer Computing for MapReduce Big Data Applications
    Li, Wei
    Guo, William
    DATA SCIENCE, PT 1, 2017, 727 : 153 - 165
  • [34] Investigation and Characterization of MapReduce Applications for Big Data Analytics
    Li, Y.
    Lam, T. B. V.
    Do, T. V. Van
    Chakka, R.
    Rotter, C.
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2018, 77 (09): : 493 - 498
  • [35] Clustering on Big Data Using Hadoop MapReduce
    Akthar, Nadeem
    Ahamad, Mohd Vasim
    Khan, Shahbaz
    2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 789 - 795
  • [36] Distributed Adaptive Routing for Big-Data Applications Running on Data Center Networks
    Zahavi, Eitan
    Keslassy, Isaac
    Kolodny, Avinoam
    PROCEEDINGS OF THE EIGHTH ACM/IEEE SYMPOSIUM ON ARCHITECTURES FOR NETWORKING AND COMMUNICATIONS SYSTEMS (ANCS'12), 2012, : 99 - 110
  • [37] Optimization enabled LeNet for big data classification using MapReduce framework on COVID-19 data
    Patle B.R.
    V V.
    Australian Journal of Electrical and Electronics Engineering, 2024, 21 (04): : 409 - 424
  • [38] Near real-time big-data processing for data driven applications
    Kampars, Janis
    Grabis, Janis
    2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA INNOVATIONS AND APPLICATIONS (INNOVATE-DATA), 2017, : 35 - 42
  • [39] A Cognitive Oriented Framework for IoT Big-data Management Prospective
    Mishra, Nilamadhab
    Lin, Chung-Chih
    Chang, Hsien-Tsung
    2014 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION PROBLEM-SOLVING (ICCP), 2014, : 124 - 127
  • [40] Advancing manufacturing systems with big-data analytics: A conceptual framework
    Kozjek, Dominik
    Vrabic, Rok
    Rihtarsic, Borut
    Lavrac, Nada
    Butala, Peter
    INTERNATIONAL JOURNAL OF COMPUTER INTEGRATED MANUFACTURING, 2020, 33 (02) : 169 - 188